[Ceph/ODF]: An ssd OSD has "slow ops" and high client IO latency when deleting RBD snapshots in batches.

Solution Verified - Updated -

Issue

An ssd OSD has "slow ops" and high client IO latency when deleting RBD snapshots in batches.

Example: Please note all the active Snaptrim Tasks involving osd.20 and the same OSD is experiencing slow ops.

$ ceph status 
  cluster:
    id:     Redacted
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            1 MDSs report slow requests
            8 slow ops, oldest one blocked for 755 sec, osd.20 has slow ops

  services:
    mon: 3 daemons, quorum be,bf,bg (age 24h)
    mgr: b(active, since 24h), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 16 osds: 16 up (since 24h), 16 in (since 3d)

  data:
    volumes: 1/1 healthy
    pools:   4 pools, 657 pgs
    objects: 4.40M objects, 8.0 TiB
    usage:   24 TiB used, 69 TiB / 93 TiB avail
    pgs:     371 active+clean
             260 active+clean+snaptrim_wait
             25  active+clean+snaptrim
             1   active+clean+scrubbing+deep

  io:
    client:   104 KiB/s rd, 830 KiB/s wr, 4 op/s rd, 72 op/s wr

$ ceph pg dump | grep "snaptrim " | cut -c1-250 | sed -e 's/ *$//'
2.fb        4453                   0         0          0        0  16860394496          182           5  2494      3000      2494        active+clean+snaptrim  2025-05-30T07:33:58.801208+0000  184878'209456269  184878:240898235   [6,13,20]
2.d9        4126                   0         0          0        0  15874811486          165           5  2310      3000      2310        active+clean+snaptrim  2025-05-30T07:22:05.863316+0000  184878'186284725  184878:202419495   [9,12,20]
2.d1        4204                   0         0          0        0  16331157154          413          16  2361      3000      2361        active+clean+snaptrim  2025-05-30T07:34:38.117882+0000  184878'200127472  184878:228118610    [7,3,20]
2.c9        4627                   0         0          0        0  17536827922            0           0  2334      3000      2334        active+clean+snaptrim  2025-05-30T07:22:05.862362+0000  184878'183062380  184878:210773020    [20,4,9]
2.b1        4336                   0         0          0        0  16377914530          639          40  2328      3000      2328        active+clean+snaptrim  2025-05-30T07:33:41.896198+0000  184878'224808727  184878:290845308   [20,10,0]
2.84        4047                   0         0          0        0  15485811386          168           5  2552      3000      2552        active+clean+snaptrim  2025-05-30T07:32:31.309154+0000  184878'222252133  184878:236729486  [12,20,13]
2.83        4292                   0         0          0        0  16547249350          363          15  2486      3000      2486        active+clean+snaptrim  2025-05-30T07:33:27.684351+0000  184878'268035684  184878:296568854    [6,9,20]
2.29        4232                   0         0          0        0  16335212544          707          25  2512      3000      2512        active+clean+snaptrim  2025-05-30T07:35:34.898580+0000  184878'293565182  184878:322682305   [11,20,6]
2.15        4399                   0         0          0        0  16766092072          239           8  2718      3000      2718        active+clean+snaptrim  2025-05-30T07:34:32.404531+0000  184878'243693131  184878:270964014    [1,0,20]
2.42        4218                   0         0          0        0  16185806884            0           0  2379      3000      2379        active+clean+snaptrim  2025-05-30T07:35:57.717792+0000  184878'254681113  184878:300503812    [3,7,20]
2.6b        4087                   0         0          0        0  15647022503          413          16  2549      3000      2549        active+clean+snaptrim  2025-05-30T07:30:24.466126+0000  184878'190120868  184878:225720216   [10,20,7]
2.70        4100                   0         0          0        0  15665591362          435          16  2639      3000      2639        active+clean+snaptrim  2025-05-30T07:35:48.609720+0000  184878'221198954  184878:249457354   [15,4,20]
2.74        4218                   0         0          0        0  16273674406           76           8  2444      3000      2444        active+clean+snaptrim  2025-05-30T07:32:53.794684+0000  184878'189143746  184878:217428080    [0,20,1]
2.103       4042                   0         0          0        0  15554657280          718          30  2419      3000      2419        active+clean+snaptrim  2025-05-30T07:31:00.115382+0000  184878'266151592  184878:294483261   [0,13,20]
2.109       3890                   0         0          0        0  14803629346        10172          49  2587      3000      2587        active+clean+snaptrim  2025-05-30T07:35:29.707839+0000  184878'264042549  184878:283312923   [13,0,20]
2.10a       4427                   0         0          0        0  16706313216          478          19  2381      3000      2381        active+clean+snaptrim  2025-05-30T07:35:24.598778+0000  184878'202604431  184878:222881307    [9,1,20]
2.11a       4228                   0         0          0        0  15971760470         3163          48  2619      3000      2619        active+clean+snaptrim  2025-05-30T07:35:57.717739+0000  184878'217259743  184878:268351123  [15,13,20]
2.185       4452                   0         0          0        0  16917721142            0           0  2547      3000      2547        active+clean+snaptrim  2025-05-30T07:35:25.115743+0000  184878'264031313  184878:300341474   [4,20,12]
2.1b2       4206                   0         0          0        0  16425524386          380          15  2516      3000      2516        active+clean+snaptrim  2025-05-30T07:33:42.669719+0000  184878'206472954  184878:235053712    [4,19,6]
2.1c1       4091                   0         0          0        0  15438998726          575          21  2355      3000      2355        active+clean+snaptrim  2025-05-30T07:36:11.617496+0000  184878'194786402  184878:218405612    [7,11,0]
2.1fc       4044                   0         0          0        0  15450713038          294           9  2425      3000      2425        active+clean+snaptrim  2025-05-30T07:32:53.794799+0000  184878'194622529  184878:220838653   [3,10,20]

Environment

  • Red Hat OpenShift Container Platform (OCP) 4.x
  • Red Hat OpenShift Data Foundation (ODF) 4.x
  • Red Hat Ceph Storage 4
  • Red Hat Ceph Storage 5
  • Red Hat Ceph Storage 6
  • Red Hat Ceph Storage 7
  • Red Hat Ceph Storage 8
  • Ceph (RADOS) Block Devices (RBD)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content