Large amount of Ceph OSDs throughout the cluster are periodically going down and up

Solution Verified - Updated 2024-08-02T05:08:39+00:00 -

Issue

Periodically OSDs go down due to no reply during heartbeat_check. The following messages may be present in logs:

2016-07-25 19:00:08.906864 7fa2a0033700 -1 osd.254 609110 heartbeat_check: no reply from osd.2 since back 2016-07-25 19:00:07.444113 front 2016-07-25 18:59:48.311935 (cutoff 2016-07-25 18:59:48.906862)

A large amount of OSDs appear to be flapping, they are not associated with a single OSD host inside the cluster.

Environment

Red Hat Ceph Storage 1.3.2

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content

Select Your Language

Large amount of Ceph OSDs throughout the cluster are periodically going down and up

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links