Placement groups found in inconsistent status in RGW multisite configurations after upgrading to RHCS 5.0z4

Solution Verified - Updated -

Issue

  • After an upgrade to RHCS 5.0z4 ( packages 16.2.0-152.el8cp ), in a Ceph cluster with rados gateway running in a multisite configuration, the command ceph status reports inconsistent PGs. Sample output:

    ceph status
      cluster:
        id:     CLUSTER_ID
        health: HEALTH_ERR
                3 scrub errors
                Possible data damage: 3 pgs inconsistent
    
      services:
        mon: 3 daemons, quorum host03,host02,host01 (age 7d)
        mgr: host02(active, since 12d), standbys: host01, host03
        osd: 20 osds: 20 up (since 7d), 20 in (since 7d)
        rgw: 6 daemons active (6 hosts, 2 zones)
    
      data:
        pools:   15 pools, 721 pgs
        objects: 3.35M objects, 946 GiB
        usage:   3.1 TiB used, 25 TiB / 28 TiB avail
        pgs:     718 active+clean
                 3   active+clean+inconsistent
    
      io:
        client:   10 KiB/s rd, 0 B/s wr, 10 op/s rd, 0 op/s wr
    
  • In all the cases reported, the affected placement groups are part of the bucket index and logs pools. Sample outputs:

    ceph health detail
    
    HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
    [ERR] OSD_SCRUB_ERRORS: 3 scrub errors
    [ERR] PG_DAMAGED: Possible data damage: 3 pgs inconsistent
    pg 7.5 is active+clean+inconsistent, acting [16,18,19]
    pg 7.a is active+clean+inconsistent, acting [19,7,13]
    pg 9.1a is active+clean+inconsistent, acting [5,15,12] 
    
    ceph osd pool ls detail
    
    pool 7 'zone1.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3820 lfor 0/3587/3591 flags hashpspool stripe_width 0 target_size_ratio 0.005 application rgw
    pool 9  'zone2.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3821 flags hashpspool stripe_width 0 target_size_ratio 0.03 application rgw
    
  • Why is this issue occurring? How to prevent this problem?

Environment

  • RHCS 5.0z4 with RGW running in a multisite configuration.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content