Ceph: Cluster in HEALTH_WARN with 1 MDSs report slow requests, object in deadlock between unlink and rename.
Issue
Cluster in HEALTH_WARN with 1 MDSs report slow requests, object in deadlock between unlink and rename.
Example:
[root@edon-0 ~]# ceph -s
cluster:
id: b613dxxx-redacted-cluster-ID-xxx9a4423a75
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming <--- This will only appear if the issue exists unresolved for many hours
services:
mon: 3 daemons, quorum edon-8,edon-4,edon-7 (age 13d)
mgr: edon-3.xxxyyy(active, since 13d), standbys: edon-2.xxxyyy
mds: 5/5 daemons up, 3 standby
osd: 91 osds: 91 up (since 13d), 91 in (since 9M)
data:
volumes: 1/1 healthy
pools: 4 pools, 2306 pgs
objects: 33.58M objects, 39 TiB
usage: 159 TiB used, 839 TiB / 998 TiB avail
pgs: 2306 active+clean
io:
client: 3.7 MiB/s rd, 2.5 MiB/s wr, 8 op/s rd, 257 op/s wr
[root@edon-0 ~]# ceph health detail
HEALTH_WARN 1 MDSs report slow requests
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.root.edon-3.wnboxv(mds.1): 3 slow requests are blocked > 30 secs
[user@edon-0 ~]# ceph fs status
root - 88 clients
====
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active root.edon-1.amvgfe Reqs: 8 /s 76.1k 75.0k 1501 38.9k
1 active root.edon-3.wnboxv Reqs: 121 /s 5321k 5309k 12.1k 494k
2 active root.edon-7.oqqvka Reqs: 0 /s 5370 3575 289 3710
3 active root.edon-9.qvfdrs Reqs: 1 /s 8184k 8178k 84.2k 434k
4 active root.edon-5.lbrqru Reqs: 556 /s 319k 307k 23.2k 45.9k
POOL TYPE USED AVAIL
cephfs.meta metadata 46.2G 12.8T
cephfs.data data 124T 246T
STANDBY MDS
root.edon-2.rdnzhn
root.edon-6.kfqjor
root.edon-4.pjrzki
MDS version: ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)
[root@edon-0 ~]# ceph tell mds.1 dump_blocked_ops | sort (If the output is extremely long pipe it to head -20)
"initiated_at": "2023-08-07T11:43:12.239105+0000",
"initiated_at": "2023-08-07T11:43:12.239425+0000",
"initiated_at": "2023-08-07T11:43:12.239488+0000",
[root@edon-0 ~]# ceph tell mds.1 dump_blocked_ops | grep description
"description": "client_request(client.61253774:24406803 unlink #0x100013e7b0d/file17 2023-08-07T11:43:12.237640+0000 caller_uid=842788, caller_gid=667140{})",
"description": "client_request(mds.1:36824 rename #0x100013e7b0d/file17 #0x60c/20009ac9901 caller_uid=0, caller_gid=0{})",
"description": "client_request(mds.1:36825 rename #0x100013e7b0d/file17 #0x60c/20009ac9901 caller_uid=0, caller_gid=0{})",
Please note the same objects (file17
) for all 3 requests and they were all initiated at the same time.
Environment
Red Hat Ceph Storage (RHCS) 5.3.4
Red Hat Ceph Storage (RHCS) 5.3.5
Red Hat Ceph Storage (RHCS) 6.0.0
Red Hat Ceph Storage (RHCS) 6.1.0
Red Hat Ceph Storage (RHCS) 6.1.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.