OCS/ODF HEALTH_ERR 1 filesystem is degraded, 1 filesystem is offline , 1 mds daemon damaged - Monitors have assigned me to become a standby
Issue
OCP applications cannot access (read or write) any PV based on cephfs
Ceph status shows no active mds daemon:
$ more ceph_status
cluster:
id: xxxxe547-xxxx-4b75-976b-xxxxxxxxx
health: HEALTH_ERR
1 filesystem is degraded
1 filesystem is offline
1 mds daemon damaged
services:
mon: 3 daemons, quorum e,m,o (age 65m)
mgr: a(active, since 83m)
mds: 0/1 daemons up, 2 standby <<-----
osd: 6 osds: 6 up (since 63m), 6 in (since 2d)
data:
volumes: 0/1 healthy, 1 recovering; 1 damaged
pools: 11 pools, 369 pgs
objects: 27.75k objects, 18 GiB
usage: 52 GiB used, 24 TiB / 24 TiB avail
pgs: 369 active+clean
$ ceph health detail
HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon damaged
[WRN] FS_DEGRADED: 1 filesystem is degraded
fs ocs-storagecluster-cephfilesystem is degraded
[ERR] MDS_ALL_DOWN: 1 filesystem is offline
fs ocs-storagecluster-cephfilesystem is offline because no MDS is active for it.
[ERR] MDS_DAMAGE: 1 mds daemon damaged
fs ocs-storagecluster-cephfilesystem mds.0 is damaged
$ ceph mds stat
ocs-storagecluster-cephfilesystem:0/1 2 up:standby, 1 damaged
# ceph fs dump
[mds.ocs-storagecluster-cephfilesystem-b{-1:70220731} state up:standby seq 1 addr [v2:10.131.56.12:6800/1431004327,v1:10.131.56.12:6801/1431004327]]
[mds.ocs-storagecluster-cephfilesystem-a{-1:70254582} state up:standby seq 2 addr [v2:10.131.54.12:6800/1914593486,v1:10.131.54.12:6801/1914593486]]
All OCS relevant pods up and running, even the mds pods:
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7f9d485fcpwbf 2/2 Running 0 1h25m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-589b69b4x6q6c 2/2 Running 0 1h8m
Environment
OCS/ODF 4.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.