Ceph: Cluster in HEALTH_WARN with 1 MDSs report slow requests and 1 MDSs behind on trimming.
Issue
Cluster in HEALTH_WARN
with 1 MDSs report slow requests
and 1 MDSs behind on trimming
.
Example:
[pao@edon1 ~]$ ceph -s
cluster:
id: 8d23xxxx-Redacted-Cluster-ID-yyyya00794f2
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming
services:
mon: 3 daemons, quorum edon3,edon2,edon1 (age 6d)
mgr: edon1(active, since 6d), standbys: edon1, edon1
mds: 2/2 daemons up, 1 standby
osd: 324 osds: 324 up (since 6d), 324 in (since 6d)
data:
volumes: 1/1 healthy
pools: 13 pools, 10625 pgs
objects: 101.79M objects, 51 TiB
usage: 158 TiB used, 458 TiB / 616 TiB avail
pgs: 10625 active+clean
[pao@edon1 ~]$ ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.cephfs.edon1.xxxyyy(mds.0): 569 slow requests are blocked > 30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.cephfs.edon1.xxxyyy(mds.0): Behind on trimming (431866/128) max_segments: 128, num_segments: 431866
Further, this KCS is specific to MDS Slow Requests
aka Blocked Ops
which are many hours old and the MDS has not been restarted.
In general, restarting the MDS when it's behind on trimming
is an extremely bad idea.
Environment
Red Hat OpenShift Data Foundations (RHODF) v4.x
Red Hat OpenShift Container Storage (RHOCS) v4.x
Red Hat OpenShift Container Platform (RHOCP) v4.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.