Ceph: MDS services crashes with "log_channel(cluster) log [ERR] : MDS abort because newly corrupt dentry to be committed"
Issue
MDS services crashes with "log_channel(cluster) log [ERR] : MDS abort because newly corrupt dentry to be committed"
Log signature matching this issue:
2023-05-01T19:39:06.443+0000 7f5897d9c700 -1 mds.0.cache.den(0x1 a) newly corrupt dentry to be committed: [dentry #0x1/a [fffffffffffffff6,head] auth (dversion lock) v=13 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x563595394280]
2023-05-01T19:39:06.443+0000 7f5897d9c700 10 mds.0.cache.dir(0x1) go_bad_dentry a
2023-05-01T19:39:06.443+0000 7f5897d9c700 -1 log_channel(cluster) log [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry #0x1/a [fffffffffffffff6,head] auth (dversion lock) v=13 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x563595394280]
2023-05-01T19:39:06.447+0000 7f5897d9c700 -1 CDentry.cc: In function 'bool CDentry::check_corruption(bool)' thread 7f5897d9c700 time 2023-05-01T19:39:06.444023+0000
2023-05-01T19:39:06.448+0000 7f5897d9c700 -1 CDentry.cc: 717: ceph_abort_msg("abort() called")
The issue will be seen as this:
$ ceph health detail
HEALTH_WARN 1 daemons have recently crashed
RECENT_CRASH 1 daemons have recently crashed
mds.ocs-storagecluster-cephfilesystem-a crashed on host rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-867d4475mhm7v at 2023-05-05 12:55:28.496127Z
$ ceph crash ls
ID ENTITY NEW
2023-05-05_12:55:28.496127Z_143e559f-0ce6-4677-917f-84b925325af4 mds.ocs-storagecluster-cephfilesystem-a *
Environment
Red Hat OpenShift Data Foundation 4.x
Red Hat Ceph Storage (RHCS) 5.3.3
Red Hat Ceph Storage (RHCS) 6.1.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.