Ceph/ODF: MDS allocates memory until OOM or memory exhaustion or "1 clients failing to respond to capability release".
Issue
The Ceph MDS very rapidly allocates memory until either an OOM occurs or memory exhaustion occurs on the node hosting the MDS.
The same code issue can manifest as a "HEALTH_WARN
" with "1 clients failing to respond to capability release
"
-bash 5.1 $ ceph -s
cluster:
id: 8d23xxxx-redacted-cluster-ID-yyyya00794f2
health: HEALTH_WARN
1 clients failing to respond to capability release
services:
mon: 3 daemons, quorum cefesp000003,cefesp000002,cefesp000001 (age 11w)
mgr: cefesp000003(active, since 2M), standbys: cefesp000002, cefesp000001
mds: 1/1 daemons up, 2 standby
osd: 321 osds: 321 up (since 12d), 321 in (since 2w)
data:
volumes: 1/1 healthy
pools: 14 pools, 13729 pgs
objects: 106.82M objects, 61 TiB
usage: 189 TiB used, 421 TiB / 610 TiB avail
pgs: 13728 active+clean
1 active+clean+scrubbing+deep
io:
client: 729 MiB/s rd, 345 MiB/s wr, 2.89k op/s rd, 4.47k op/s wr
-bash 5.1 $ ceph health detail
HEALTH_WARN 1 clients failing to respond to capability release
[WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release
mds.cephfs.cefesp000002.gruurg(mds.0): Client smeesp000032.mydomain.org:csi-cephfs-node failing to respond to capability release client_id: 390442015
Environment
- Red Hat OpenShift Container Platform (OCP) 4.x
- Red Hat OpenShift Container Storage (OCS) 4.x
- Red Hat OpenShift Data Foundation (ODF) 4.x
- Red Hat Enterprise Linux (RHEL) 9.x
- RHEL 9.2 with kernel
5.14.0-284.66.1.el9_2
or higher - RHEL 9.4 with kernel
5.14.0-427.17.1.el9_4
or higher
- RHEL 9.2 with kernel
- Red Hat Ceph Storage (RHCS) 5.x
- Red Hat Ceph Storage (RHCS) 6.x
- Red Hat Ceph Storage (RHCS) 7.x
- Red Hat Ceph Storage (RHCS) 8.x
- Ceph File System (CephFS)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.