Large amount of Ceph OSDs throughout the cluster are periodically going down and up

Solution Verified - Updated -

Issue

  • Periodically OSDs go down due to no reply during heartbeat_check. The following messages may be present in logs:
2016-07-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2016-07-25 19:00:07.444113 front 2016-07-25 18:59:48.311935 (cutoff 2016-07-25 18:59:48.906862)
  • A large amount of OSDs appear to be flapping, they are not associated with a single OSD host inside the cluster.

Environment

  • Red Hat Ceph Storage 1.3.2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.