ODF: MDS pods in CrashLoopBackOff (CLBO) and "EMetaBlob.replay" and "sessionmap" are in the traceback.
Issue
MDS pods in CrashLoopBackOff (CLBO) and EMetaBlob.replay
and sessionmap
are in the traceback.
The Ceph MDS service is in CrashLoopBackOff (CLBO)
. To continue with this article, the signature of the crash must be similar to the example below.
$ oc get pods | grep rook-ceph-mds-ocs
NAME READY STATUS RESTARTS AGE
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-b9bd569fbdkk5 1/2 Running 384 1d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5bbccc9d88zs5 1/2 Running 383 1d
$ oc get events
Pod openshift-storage/rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5bbccc9d88zs5 (mds) is in waiting state (reason: "CrashLoopBackOff")
Pod openshift-storage/rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-b9bd569fbdkk5 (mds) is in waiting state (reason: "CrashLoopBackOff")
Crash signature:
Please note EMetaBlob.replay sessionmap
in the traceback:
$ ceph crash ls
$ ceph crash info <crash-id>
debug 2023-01-13 01:22:37.619 7fddbca74700 1 mds.0.175006 waiting for osdmap 47157 (which blacklists prior instance)
debug 2023-01-13 01:22:37.638 7fddb6267700 0 mds.0.cache creating system inode with ino:0x100
debug 2023-01-13 01:22:37.638 7fddb6267700 0 mds.0.cache creating system inode with ino:0x1
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7fddb4a64700 time 20
23-01-13 01:22:37.721236
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: 1551: FAILED ceph_assert(g_conf()->mds_wipe_sessions)
debug 2023-01-13 01:22:37.719 7fddb4a64700 -1 log_channel(cluster) log [ERR] : EMetaBlob.replay sessionmap v 145397255 - 1 > table 0
ceph version 14.2.11-208.el8cp (6738ba96f296a41c24357c12e8d594fbde457abc) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7fddc6847308]
2: (()+0x275522) [0x7fddc6847522]
3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x6b54) [0x55b8999231b4]
4: (EUpdate::replay(MDSRank*)+0x40) [0x55b899925740]
5: (MDLog::_replay_thread()+0xbee) [0x55b8998c49ae]
6: (MDLog::ReplayThread::entry()+0x11) [0x55b8996299c1]
7: (()+0x817a) [0x7fddc462717a]
8: (clone()+0x43) [0x7fddc313edc3]
*** Caught signal (Aborted) **
in thread 7fddb4a64700 thread_name:md_log_replay
debug 2023-01-13 01:22:37.720 7fddb4a64700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7fddb4a64700 time 2023-01-13 01:22:37.721236
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: 1551: FAILED ceph_assert(g_conf()->mds_wipe_sessions)
Environment
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Cluster Platform (OCP) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.