Ceph/ODF: MGR is unresponsive due to commands being blocked from the "volumes plugin" and "Ceph-MGR finisher thread".

Solution Verified - Updated -

Issue

The Ceph-MGR is unresponsive due to commands being blocked from the volumes plugin and Ceph-MGR finisher thread.

The Ceph-MGR is not servicing requests resulting in these symptoms:

  • Running ceph osd df tree hangs, never completes
  • The output from ceph status does not match reality
  • The PG state in ceph status does not match the PG state seen in ceph pg query.
  • Ceph is slow due to the mgr not responding

ceph daemon DAEMON_NAME perf dump > 1000 entries in the get_or_fail_fail queue:

    "throttle-mgr_mon_messsages": {
        "val": 128,
        "max": 128,
        "get_started": 0,
        "get": 139,
        "get_sum": 139,
        "get_or_fail_fail": 10941044,   <-- Here
        "get_or_fail_success": 139,
        "take": 0,
        "take_sum": 0,
        "put": 11,
        "put_sum": 11,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    },

Environment

Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content