Why does a single disk failure result in all the OSDs getting restarted in a node?
Issue
- Why does a single disk failure result in all the
OSD
s getting restarted in a node?
Oct 19 05:31:11 osd01 journal: xxxxxx-xx-xx 05:31:11.457148 7f35f643d700 -1 osd.xxx 616999 heartbeat_check: no reply from xxx.xxx.xxx.xxx:6819 osd.xxx since back xxxxxx-xx-xx 05:30:12.224488 front xxxxxx-xx-xx 05:31:12.224488 (cutoff xxxxx)
- When you have multiple disks connected to a
single controller
, the failure of a single disk can result inHBA reset
and consequently all theOSD
s getting restarted. Is this expected?
Environment
- Red Hat Ceph Storage
scsi
osd
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.