ODF: MDS pods in CrashLoopBackOff (CLBO) and "EMetaBlob.replay" and "sessionmap" are in the traceback.

Solution Verified - Updated -

Issue

MDS pods in CrashLoopBackOff (CLBO) and EMetaBlob.replay and sessionmap are in the traceback.

The Ceph MDS service is in CrashLoopBackOff (CLBO). To continue with this article, the signature of the crash must be similar to the example below.

$ oc get pods | grep rook-ceph-mds-ocs
NAME                                                             READY  STATUS   RESTARTS  AGE
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-b9bd569fbdkk5  1/2    Running  384       1d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5bbccc9d88zs5  1/2    Running  383       1d

$ oc get events
Pod openshift-storage/rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5bbccc9d88zs5 (mds) is in waiting state (reason: "CrashLoopBackOff")
Pod openshift-storage/rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-b9bd569fbdkk5 (mds) is in waiting state (reason: "CrashLoopBackOff")

Crash signature:
Please note EMetaBlob.replay sessionmap in the traceback:

$ ceph crash ls
$ ceph crash info <crash-id>

debug 2023-01-13 01:22:37.619 7fddbca74700  1 mds.0.175006  waiting for osdmap 47157 (which blacklists prior instance)
debug 2023-01-13 01:22:37.638 7fddb6267700  0 mds.0.cache creating system inode with ino:0x100
debug 2023-01-13 01:22:37.638 7fddb6267700  0 mds.0.cache creating system inode with ino:0x1
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7fddb4a64700 time 20
23-01-13 01:22:37.721236
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: 1551: FAILED ceph_assert(g_conf()->mds_wipe_sessions)
debug 2023-01-13 01:22:37.719 7fddb4a64700 -1 log_channel(cluster) log [ERR] : EMetaBlob.replay sessionmap v 145397255 - 1 > table 0
 ceph version 14.2.11-208.el8cp (6738ba96f296a41c24357c12e8d594fbde457abc) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x156) [0x7fddc6847308]
 2: (()+0x275522) [0x7fddc6847522]
 3: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x6b54) [0x55b8999231b4]
 4: (EUpdate::replay(MDSRank*)+0x40) [0x55b899925740]
 5: (MDLog::_replay_thread()+0xbee) [0x55b8998c49ae]
 6: (MDLog::ReplayThread::entry()+0x11) [0x55b8996299c1]
 7: (()+0x817a) [0x7fddc462717a]
 8: (clone()+0x43) [0x7fddc313edc3]
*** Caught signal (Aborted) **
 in thread 7fddb4a64700 thread_name:md_log_replay
debug 2023-01-13 01:22:37.720 7fddb4a64700 -1 /builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: In function 'void EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)' thread 7fddb4a64700 time 2023-01-13 01:22:37.721236
/builddir/build/BUILD/ceph-14.2.11/src/mds/journal.cc: 1551: FAILED ceph_assert(g_conf()->mds_wipe_sessions)

Environment

Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Cluster Platform (OCP) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content