Why does a single disk failure result in all the OSDs getting restarted in a node?

Solution Verified - Updated 2020-12-01T05:09:07+00:00 -

Issue

Why does a single disk failure result in all the OSDs getting restarted in a node?

Oct 19 05:31:11 osd01 journal: xxxxxx-xx-xx 05:31:11.457148 7f35f643d700 -1 osd.xxx 616999 heartbeat_check: no reply from xxx.xxx.xxx.xxx:6819 osd.xxx since back xxxxxx-xx-xx 05:30:12.224488 front xxxxxx-xx-xx 05:31:12.224488 (cutoff xxxxx)

When you have multiple disks connected to a single controller, the failure of a single disk can result in HBA reset and consequently all the OSDs getting restarted. Is this expected?

Environment

Red Hat Ceph Storage
scsi
osd

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content

Quick Links

Help

Site Info

Related Sites

Copyright © 2024 Red Hat, Inc.

Here are the common uses of Markdown.

Code blocks

~~~
Code surrounded in tildes is easier to read
~~~

Links/URLs

[Red Hat Customer Portal](https://access.redhat.com)