Ceph/ODF: xxx PGs not scrubbed in time plus 1 PG stuck in "recovering".
Environment
Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x
Issue
xxx PGs not scrubbed in time plus 1 PG stuck in "recovering
".
Example:
-bash 5.1 $ ceph -s
cluster:
id: ce4a9f0d-7c13-44e5-8821-99208abf801a
health: HEALTH_WARN
Degraded data redundancy: 21/8035722 objects degraded (0.000%), 7 pgs degraded
125 pgs not deep-scrubbed in time [1]
125 pgs not scrubbed in time [1]
3 slow ops, oldest one blocked for 199591 sec, osd.8 has slow ops
services:
mon: 3 daemons, quorum a,b,d (age 2d)
mgr: a(active, since 2d)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 9 up (since 2d), 9 in (since 9M)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 377 pgs
objects: 2.68M objects, 6.8 TiB
usage: 20 TiB used, 16 TiB / 36 TiB avail
pgs: 21/8035722 objects degraded (0.000%)
362 active+clean
7 active+recovery_wait+degraded
7 active+recovery_wait
1 active+recovering [2]
io:
client: 188 KiB/s rd, 8.6 MiB/s wr, 8 op/s rd, 394 op/s wr
[1] PGs not scrubbed in time.
[2] PG stuck in recovering.
Resolution
- To resolve the issue, follow the step in the
Workaround:
- If time allows, follow the
Diagnostic Steps
to gather artifacts regarding this issue - Open a Support case with Red Hat reference this KCS Article, #7063971.
- For non Red Hat Ceph customers and Upstream Tracker should be an option.
Workaround:
Issue the command ceph osd down osd.{id}
to soft reset the PID for the given OSD.
1.) Determine the Primary OSD
for the PG in recovering
.
bash-5.1$ ceph pg dump | egrep -i "DEGRADED|recovering"
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING
9.30 1026 1 1 0 0 4294320128 0 0 2473 2473 active+recovery_wait+degraded 2024-04-03T07:17:39.013476+0000 9140'24449033 9140:32103942 [8,6,7] 8 [8,6,7]
12.1a 2499 6 6 0 0 3606355452 0 0 2500 2500 active+recovery_wait+degraded 2024-04-03T07:17:40.998794+0000 9140'25121639 9140:169241235 [8,7,5] 8 [8,7,5]
9.17 1003 1 1 0 0 4193726464 0 0 2497 2497 active+recovery_wait+degraded 2024-04-03T07:17:39.980501+0000 9139'23728092 9139:26213909 [8,1,6] 8 [8,1,6]
12.13 2402 2 2 0 0 3527951616 0 0 2484 2484 active+recovery_wait+degraded 2024-04-03T07:17:39.117875+0000 9140'24417443 9140:143515390 [8,5,1] 8 [8,5,1]
12.17 2491 4 4 0 0 3589733167 0 0 2472 2472 active+recovery_wait+degraded 2024-04-03T07:17:39.964419+0000 9140'24378962 9140:164066864 [8,4,5] 8 [8,4,5]
12.2 2513 1 1 0 0 3492070631 0 0 2497 2497 active+recovery_wait+degraded 2024-04-03T07:17:39.260229+0000 9140'23437166 9140:158902874 [8,1,0] 8 [8,1,0]
12.3 2538 6 6 0 0 3669926438 0 0 2432 2432 active+recovery_wait+degraded 2024-04-03T07:17:40.015330+0000 9140'25151871 9140:161510958 [8,6,7] 8 [8,6,7]
12.7 2516 5 0 0 0 3627864861 0 0 2423 2423 active+recovering 2024-04-03T07:19:02.339503+0000 9139'24045830 9139:168882678 [8,0,1] 8 [8,0,1]
dumped all
The Primary OSD
is the left-most number in the brackets, []. In this example, it's osd.8
.
2.) Issue ceph osd down
command.
bash-5.1$ ceph osd down osd.8
3.) In a few minutes the issue is resolved:
bash-5.1$ ceph -s
cluster:
id: ce4a9f0d-7c13-44e5-8821-99208abf801a
health: HEALTH_WARN
125 pgs not deep-scrubbed in time [3]
125 pgs not scrubbed in time [3]
services:
mon: 3 daemons, quorum a,b,d (age 2d)
mgr: a(active, since 2d)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 9 up (since 20s), 9 in (since 9M)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 377 pgs
objects: 2.68M objects, 6.8 TiB
usage: 20 TiB used, 16 TiB / 36 TiB avail
pgs: 363 active+clean
8 active+clean+scrubbing+deep [2]
3 active+clean+snaptrim_wait [1]
2 active+clean+snaptrim [1]
1 active+clean+scrubbing [2]
io:
client: 290 KiB/s rd, 8.9 MiB/s wr, 15 op/s rd, 397 op/s wr
We now see snaptrim cleaning up capacity [2]
We see scrubbing running [2] and there is much still to do [3]
These things will now sort themselves out over time.
Diagnostic Steps
If your organization has Ceph or ODF subscriptions with Red Hat, follow these steps to gather this information and attach it to a support case.
1.) Determine the Primary OSD
for the PG in recovering
.
bash-5.1$ ceph pg dump | egrep -i "DEGRADED|recovering"
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING
9.30 1026 1 1 0 0 4294320128 0 0 2473 2473 active+recovery_wait+degraded 2024-04-03T07:17:39.013476+0000 9140'24449033 9140:32103942 [8,6,7] 8 [8,6,7]
12.1a 2499 6 6 0 0 3606355452 0 0 2500 2500 active+recovery_wait+degraded 2024-04-03T07:17:40.998794+0000 9140'25121639 9140:169241235 [8,7,5] 8 [8,7,5]
9.17 1003 1 1 0 0 4193726464 0 0 2497 2497 active+recovery_wait+degraded 2024-04-03T07:17:39.980501+0000 9139'23728092 9139:26213909 [8,1,6] 8 [8,1,6]
12.13 2402 2 2 0 0 3527951616 0 0 2484 2484 active+recovery_wait+degraded 2024-04-03T07:17:39.117875+0000 9140'24417443 9140:143515390 [8,5,1] 8 [8,5,1]
12.17 2491 4 4 0 0 3589733167 0 0 2472 2472 active+recovery_wait+degraded 2024-04-03T07:17:39.964419+0000 9140'24378962 9140:164066864 [8,4,5] 8 [8,4,5]
12.2 2513 1 1 0 0 3492070631 0 0 2497 2497 active+recovery_wait+degraded 2024-04-03T07:17:39.260229+0000 9140'23437166 9140:158902874 [8,1,0] 8 [8,1,0]
12.3 2538 6 6 0 0 3669926438 0 0 2432 2432 active+recovery_wait+degraded 2024-04-03T07:17:40.015330+0000 9140'25151871 9140:161510958 [8,6,7] 8 [8,6,7]
12.7 2516 5 0 0 0 3627864861 0 0 2423 2423 active+recovering 2024-04-03T07:19:02.339503+0000 9139'24045830 9139:168882678 [8,0,1] 8 [8,0,1]
dumped all
The Primary OSD
is the left-most number in the brackets, []. In this example, it's osd.8
.
Please note: osd.8
is also showing Slow Ops
in the ceph -s
output above.
Throughout this example osd.8
used, please replace osd.8
with the OSD you find to be at issue.
2.) Increase DEBUG logging:
bash-5.1$ ceph config set osd.8 debug_osd 20
bash-5.1$ ceph config set global debug_ms 1
bash-5.1$ sleep 3m ## Wait at least 3 minutes
3.) Gather logs.
For standalone (or external) Ceph, on the node hosting the OSD, generate an SOS using --all-logs
For internal Ceph (ODF), run an ODF Must Gather
On the node hosting the problematic OSD, gather all the Ceph logs (ODF only):
bash-5.1$ oc debug node/<node-name>
bash-5.1$ chroot /host
bash-5.1$ cd /var/lib/rook/openshift-storage/log
bash-5.1$ tar czvf /tmp/osd.8.tgz .
4.) Gather metrics directly from the problematic OSD.
O=8; D=$(date '+%F_%H-%M.osd.8')
ceph tell osd.${O} config diff > osd.${D}.config_diff.out
ceph tell osd.${O} config show > osd.${D}.config_show.out
ceph tell osd.${O} dump_blocked_ops > osd.${D}.dump_blocked_ops.out
ceph tell osd.${O} dump_blocklist > osd.${D}.dump_blocklist.out
ceph tell osd.${O} dump_historic_ops > osd.${D}.dump_historic_ops.out
ceph tell osd.${O} dump_historic_ops_by_duration > osd.${D}.dump_historic_ops_by_duration.out
ceph tell osd.${O} dump_historic_slow_ops > osd.${D}.dump_historic_slow_ops.out
ceph tell osd.${O} dump_mempools > osd.${D}.dump_mempools.out
ceph tell osd.${O} dump_objectstore_kv_stats > osd.${D}.dump_objectstore_kv_stats.out
ceph tell osd.${O} dump_op_pq_state > osd.${D}.dump_op_pq_state.out
ceph tell osd.${O} dump_ops_in_flight > osd.${D}.dump_ops_in_flight.out
ceph tell osd.${O} dump_osd_network > osd.${D}.dump_osd_network.out
ceph tell osd.${O} perf dump > osd.${D}.perf_dump.out
ceph tell osd.${O} perf histogram dump > osd.${D}.perf_histogram_dump.out
ceph tell osd.${O} perf histogram schema > osd.${D}.perf_histogram_schema.out
tar zcvf /tmp/osd.X.tgz *out
5.) Follow the Workaround
in the Resolution
section above.
6.) Remove DEBUG logging:
bash-5.1$ ceph config rm osd.8 debug_osd
bash-5.1$ ceph config rm global debug_ms
7.) Gather a second Must Gather or SOS (whichever applies)
8.) Attach the Must Gather and/or SOS bundles along with the osd.X.tgz
file to a Red Hat support case.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments