Why does a single disk failure result in all the OSDs getting restarted in a node?

Solution Verified - Updated -

Issue

  • Why does a single disk failure result in all the OSDs getting restarted in a node?
Oct 19 05:31:11 osd01 journal: xxxxxx-xx-xx 05:31:11.457148 7f35f643d700 -1 osd.xxx 616999 heartbeat_check: no reply from xxx.xxx.xxx.xxx:6819 osd.xxx since back xxxxxx-xx-xx 05:30:12.224488 front xxxxxx-xx-xx 05:31:12.224488 (cutoff xxxxx)
  • When you have multiple disks connected to a single controller, the failure of a single disk can result in HBA reset and consequently all the OSDs getting restarted. Is this expected?

Environment

  • Red Hat Ceph Storage
  • scsi
  • osd

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content