Ceph: Cluster in HEALTH_WARN with 1 MDSs report slow requests and 1 MDSs behind on trimming.

Solution Verified - Updated -

Issue

Cluster in HEALTH_WARN with 1 MDSs report slow requests and 1 MDSs behind on trimming.

Example:

[pao@edon1 ~]$ ceph -s
  cluster:
    id:     8d23xxxx-Redacted-Cluster-ID-yyyya00794f2
    health: HEALTH_WARN
            1 MDSs report slow requests
            1 MDSs behind on trimming

  services:
    mon: 3 daemons, quorum edon3,edon2,edon1 (age 6d)
    mgr: edon1(active, since 6d), standbys: edon1, edon1
    mds: 2/2 daemons up, 1 standby
    osd: 324 osds: 324 up (since 6d), 324 in (since 6d)

  data:
    volumes: 1/1 healthy
    pools:   13 pools, 10625 pgs
    objects: 101.79M objects, 51 TiB
    usage:   158 TiB used, 458 TiB / 616 TiB avail
    pgs:     10625 active+clean

[pao@edon1 ~]$ ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
    mds.cephfs.edon1.xxxyyy(mds.0): 569 slow requests are blocked > 30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
    mds.cephfs.edon1.xxxyyy(mds.0): Behind on trimming (431866/128) max_segments: 128, num_segments: 431866

Further, this KCS is specific to MDS Slow Requestsaka Blocked Ops which are many hours old and the MDS has not been restarted.
In general, restarting the MDS when it's behind on trimming is an extremely bad idea.

Environment

Red Hat OpenShift Data Foundations (RHODF) v4.x
Red Hat OpenShift Container Storage (RHOCS) v4.x
Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content