After an upgrade to RHCS 5.0z4 ( packages 16.2.0-152.el8cp ), in a Ceph cluster with rados gateway running in a multisite configuration, the command
ceph statusreports inconsistent PGs. Sample output:
ceph status cluster: id: CLUSTER_ID health: HEALTH_ERR 3 scrub errors Possible data damage: 3 pgs inconsistent services: mon: 3 daemons, quorum host03,host02,host01 (age 7d) mgr: host02(active, since 12d), standbys: host01, host03 osd: 20 osds: 20 up (since 7d), 20 in (since 7d) rgw: 6 daemons active (6 hosts, 2 zones) data: pools: 15 pools, 721 pgs objects: 3.35M objects, 946 GiB usage: 3.1 TiB used, 25 TiB / 28 TiB avail pgs: 718 active+clean 3 active+clean+inconsistent io: client: 10 KiB/s rd, 0 B/s wr, 10 op/s rd, 0 op/s wr
In all the cases reported, the affected placement groups are part of the bucket index and logs pools. Sample outputs:
ceph health detail HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent [ERR] OSD_SCRUB_ERRORS: 3 scrub errors [ERR] PG_DAMAGED: Possible data damage: 3 pgs inconsistent pg 7.5 is active+clean+inconsistent, acting [16,18,19] pg 7.a is active+clean+inconsistent, acting [19,7,13] pg 9.1a is active+clean+inconsistent, acting [5,15,12] ceph osd pool ls detail pool 7 'zone1.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3820 lfor 0/3587/3591 flags hashpspool stripe_width 0 target_size_ratio 0.005 application rgw pool 9 'zone2.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3821 flags hashpspool stripe_width 0 target_size_ratio 0.03 application rgw
Why is this issue occurring? How to prevent this problem?
- RHCS 5.0z4 with RGW running in a multisite configuration.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.