Ceph/ODF: xxx PGs not scrubbed in time plus 1 PG stuck in "recovering".

Solution Verified - Updated -

Environment

Red Hat OpenShift Container Platform (OCP) 4.x
Red Hat OpenShift Container Storage (OCS) 4.x
Red Hat OpenShift Data Foundation (ODF) 4.x
Red Hat Ceph Storage (RHCS) 5.x
Red Hat Ceph Storage (RHCS) 6.x
Red Hat Ceph Storage (RHCS) 7.x

Issue

xxx PGs not scrubbed in time plus 1 PG stuck in "recovering".

Example:

-bash 5.1 $ ceph -s
  cluster:
    id:     ce4a9f0d-7c13-44e5-8821-99208abf801a
    health: HEALTH_WARN
            Degraded data redundancy: 21/8035722 objects degraded (0.000%), 7 pgs degraded
            125 pgs not deep-scrubbed in time      [1]
            125 pgs not scrubbed in time           [1]
            3 slow ops, oldest one blocked for 199591 sec, osd.8 has slow ops

  services:
    mon: 3 daemons, quorum a,b,d (age 2d)
    mgr: a(active, since 2d)
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 2d), 9 in (since 9M)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   12 pools, 377 pgs
    objects: 2.68M objects, 6.8 TiB
    usage:   20 TiB used, 16 TiB / 36 TiB avail
    pgs:     21/8035722 objects degraded (0.000%)
             362 active+clean
             7   active+recovery_wait+degraded
             7   active+recovery_wait
             1   active+recovering             [2]

  io:
    client:   188 KiB/s rd, 8.6 MiB/s wr, 8 op/s rd, 394 op/s wr

[1] PGs not scrubbed in time.
[2] PG stuck in recovering.

Resolution

  • To resolve the issue, follow the step in the Workaround:
  • If time allows, follow the Diagnostic Steps to gather artifacts regarding this issue
  • Open a Support case with Red Hat reference this KCS Article, #7063971.
  • For non Red Hat Ceph customers and Upstream Tracker should be an option.

Workaround:
Issue the command ceph osd down osd.{id} to soft reset the PID for the given OSD.

1.) Determine the Primary OSD for the PG in recovering.

bash-5.1$ ceph pg dump | egrep -i "DEGRADED|recovering"
PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG  STATE                          STATE_STAMP                      VERSION        REPORTED        UP       UP_PRIMARY  ACTING
9.30        1026                   1         1          0        0   4294320128            0           0  2473      2473  active+recovery_wait+degraded  2024-04-03T07:17:39.013476+0000  9140'24449033   9140:32103942  [8,6,7]           8  [8,6,7]
12.1a       2499                   6         6          0        0   3606355452            0           0  2500      2500  active+recovery_wait+degraded  2024-04-03T07:17:40.998794+0000  9140'25121639  9140:169241235  [8,7,5]           8  [8,7,5]
9.17        1003                   1         1          0        0   4193726464            0           0  2497      2497  active+recovery_wait+degraded  2024-04-03T07:17:39.980501+0000  9139'23728092   9139:26213909  [8,1,6]           8  [8,1,6]
12.13       2402                   2         2          0        0   3527951616            0           0  2484      2484  active+recovery_wait+degraded  2024-04-03T07:17:39.117875+0000  9140'24417443  9140:143515390  [8,5,1]           8  [8,5,1]
12.17       2491                   4         4          0        0   3589733167            0           0  2472      2472  active+recovery_wait+degraded  2024-04-03T07:17:39.964419+0000  9140'24378962  9140:164066864  [8,4,5]           8  [8,4,5]
12.2        2513                   1         1          0        0   3492070631            0           0  2497      2497  active+recovery_wait+degraded  2024-04-03T07:17:39.260229+0000  9140'23437166  9140:158902874  [8,1,0]           8  [8,1,0]
12.3        2538                   6         6          0        0   3669926438            0           0  2432      2432  active+recovery_wait+degraded  2024-04-03T07:17:40.015330+0000  9140'25151871  9140:161510958  [8,6,7]           8  [8,6,7]
12.7        2516                   5         0          0        0   3627864861            0           0  2423      2423              active+recovering  2024-04-03T07:19:02.339503+0000  9139'24045830  9139:168882678  [8,0,1]           8  [8,0,1]
dumped all

The Primary OSD is the left-most number in the brackets, []. In this example, it's osd.8.

2.) Issue ceph osd down command.

bash-5.1$ ceph osd down osd.8

3.) In a few minutes the issue is resolved:

bash-5.1$ ceph -s
  cluster:
    id:     ce4a9f0d-7c13-44e5-8821-99208abf801a
    health: HEALTH_WARN
            125 pgs not deep-scrubbed in time [3]
            125 pgs not scrubbed in time      [3]

  services:
    mon: 3 daemons, quorum a,b,d (age 2d)
    mgr: a(active, since 2d)
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 20s), 9 in (since 9M)
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   12 pools, 377 pgs
    objects: 2.68M objects, 6.8 TiB
    usage:   20 TiB used, 16 TiB / 36 TiB avail
    pgs:     363 active+clean
             8   active+clean+scrubbing+deep [2]
             3   active+clean+snaptrim_wait  [1]
             2   active+clean+snaptrim       [1]
             1   active+clean+scrubbing      [2]

  io:
    client:   290 KiB/s rd, 8.9 MiB/s wr, 15 op/s rd, 397 op/s wr

We now see snaptrim cleaning up capacity  [2]
We see scrubbing running [2] and there is much still to do [3]
These things will now sort themselves out over time.

Diagnostic Steps

If your organization has Ceph or ODF subscriptions with Red Hat, follow these steps to gather this information and attach it to a support case.

1.) Determine the Primary OSD for the PG in recovering.

bash-5.1$ ceph pg dump | egrep -i "DEGRADED|recovering"
PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG  STATE                          STATE_STAMP                      VERSION        REPORTED        UP       UP_PRIMARY  ACTING
9.30        1026                   1         1          0        0   4294320128            0           0  2473      2473  active+recovery_wait+degraded  2024-04-03T07:17:39.013476+0000  9140'24449033   9140:32103942  [8,6,7]           8  [8,6,7]
12.1a       2499                   6         6          0        0   3606355452            0           0  2500      2500  active+recovery_wait+degraded  2024-04-03T07:17:40.998794+0000  9140'25121639  9140:169241235  [8,7,5]           8  [8,7,5]
9.17        1003                   1         1          0        0   4193726464            0           0  2497      2497  active+recovery_wait+degraded  2024-04-03T07:17:39.980501+0000  9139'23728092   9139:26213909  [8,1,6]           8  [8,1,6]
12.13       2402                   2         2          0        0   3527951616            0           0  2484      2484  active+recovery_wait+degraded  2024-04-03T07:17:39.117875+0000  9140'24417443  9140:143515390  [8,5,1]           8  [8,5,1]
12.17       2491                   4         4          0        0   3589733167            0           0  2472      2472  active+recovery_wait+degraded  2024-04-03T07:17:39.964419+0000  9140'24378962  9140:164066864  [8,4,5]           8  [8,4,5]
12.2        2513                   1         1          0        0   3492070631            0           0  2497      2497  active+recovery_wait+degraded  2024-04-03T07:17:39.260229+0000  9140'23437166  9140:158902874  [8,1,0]           8  [8,1,0]
12.3        2538                   6         6          0        0   3669926438            0           0  2432      2432  active+recovery_wait+degraded  2024-04-03T07:17:40.015330+0000  9140'25151871  9140:161510958  [8,6,7]           8  [8,6,7]
12.7        2516                   5         0          0        0   3627864861            0           0  2423      2423              active+recovering  2024-04-03T07:19:02.339503+0000  9139'24045830  9139:168882678  [8,0,1]           8  [8,0,1]
dumped all

The Primary OSD is the left-most number in the brackets, []. In this example, it's osd.8.
Please note: osd.8 is also showing Slow Ops in the ceph -s output above.
Throughout this example osd.8 used, please replace osd.8 with the OSD you find to be at issue.

2.) Increase DEBUG logging:

bash-5.1$ ceph config set osd.8 debug_osd 20
bash-5.1$ ceph config set global debug_ms 1
bash-5.1$ sleep 3m     ## Wait at least 3 minutes

3.) Gather logs.

For standalone (or external) Ceph, on the node hosting the OSD, generate an SOS using --all-logs
For internal Ceph (ODF), run an ODF Must Gather
On the node hosting the problematic OSD, gather all the Ceph logs (ODF only):

bash-5.1$ oc debug node/<node-name>
bash-5.1$ chroot /host
bash-5.1$ cd /var/lib/rook/openshift-storage/log
bash-5.1$ tar czvf /tmp/osd.8.tgz .

4.) Gather metrics directly from the problematic OSD.

O=8; D=$(date '+%F_%H-%M.osd.8')
ceph tell osd.${O} config diff > osd.${D}.config_diff.out
ceph tell osd.${O} config show > osd.${D}.config_show.out
ceph tell osd.${O} dump_blocked_ops > osd.${D}.dump_blocked_ops.out
ceph tell osd.${O} dump_blocklist > osd.${D}.dump_blocklist.out
ceph tell osd.${O} dump_historic_ops > osd.${D}.dump_historic_ops.out
ceph tell osd.${O} dump_historic_ops_by_duration > osd.${D}.dump_historic_ops_by_duration.out
ceph tell osd.${O} dump_historic_slow_ops > osd.${D}.dump_historic_slow_ops.out
ceph tell osd.${O} dump_mempools > osd.${D}.dump_mempools.out
ceph tell osd.${O} dump_objectstore_kv_stats > osd.${D}.dump_objectstore_kv_stats.out
ceph tell osd.${O} dump_op_pq_state > osd.${D}.dump_op_pq_state.out
ceph tell osd.${O} dump_ops_in_flight > osd.${D}.dump_ops_in_flight.out
ceph tell osd.${O} dump_osd_network > osd.${D}.dump_osd_network.out
ceph tell osd.${O} perf dump > osd.${D}.perf_dump.out
ceph tell osd.${O} perf histogram dump > osd.${D}.perf_histogram_dump.out
ceph tell osd.${O} perf histogram schema > osd.${D}.perf_histogram_schema.out

tar zcvf /tmp/osd.X.tgz *out

5.) Follow the Workaround in the Resolution section above.

6.) Remove DEBUG logging:

bash-5.1$ ceph config rm osd.8 debug_osd
bash-5.1$ ceph config rm global debug_ms

7.) Gather a second Must Gather or SOS (whichever applies)

8.) Attach the Must Gather and/or SOS bundles along with the osd.X.tgz file to a Red Hat support case.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments