Large amount of Ceph OSDs throughout the cluster are periodically going down and up
Issue
- Periodically OSDs go down due to no reply during
heartbeat_check. The following messages may be present in logs:
2016-07-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2016-07-25 19:00:07.444113 front 2016-07-25 18:59:48.311935 (cutoff 2016-07-25 18:59:48.906862)
- A large amount of OSDs appear to be flapping, they are not associated with a single OSD host inside the cluster.
Environment
- Red Hat Ceph Storage 1.3.2
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.