Large amount of Ceph OSDs throughout the cluster are periodically going down and up
Issue
- Periodically OSDs go down due to no reply during
heartbeat_check. The following messages may be present in logs:
2016-07-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2016-07-25 19:00:07.444113 front 2016-07-25 18:59:48.311935 (cutoff 2016-07-25 18:59:48.906862)
- A large amount of OSDs appear to be flapping, they are not associated with a single OSD host inside the cluster.
Environment
- Red Hat Ceph Storage 1.3.2
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
