[Ceph][RGW] Large OMAP object Health warning in the bucket index pool for stale bucket instances

Solution In Progress - Updated -

Environment

  • Red Hat Ceph Storage
    • 4.x
    • 3.x
  • RGW Multisite environment

Issue

  • Executing the ceph health detail shows a large omap warning in the bucket index pool
HEALTH_WARN 3 large omap objects
LARGE_OMAP_OBJECTS 3 large omap objects
    3 large objects found in pool 'zz1.rgw.buckets.index'
    Search the cluster log for 'Large omap object found' for more details.
  • Bucket ID in the ceph.log belongs to deleted bucket listed under stale instances

Resolution

  • If the bilog count or size is greater than the configuration parameter osd_deep_scrub_large_omap_object_key_threshold and osd_deep_scrub_large_omap_object_value_sum_threshold which defaults to 200000 (i.e 200k) and 1_G respectively

    • Execute the bilog trim

      $ radosgw-admin bilog trim --bucket="bucket-name" --bucket-id="bucket-id"
      
    • Execute the deep-scrub onto the PG manually where that large omap object is located ( In the ceph.log file )

      $ ceph osd pg deep-scrub <pgid>
      

Root Cause

  • In the multisite environment, it is by design that the deleted buckets are not removed automatically
  • If the deleted bucket index id shows a large omap warning, either the sync has not completed successfully or the bilogs are not getting trimmed

Diagnostic Steps

  • Large OMAP warning in the ceph.log start with ".dir" indicating that it belongs to the index pool
2021-9-05 06:54:25.093024 osd.144 (osd.144) 592 : cluster [WRN] Large omap object found. Object: 89:039c6a78:::.dir.e9ff35e0-e817-5363-9d1c-eefe85926cab.32068365.1.1:head PG: 89.1e5639c0 (89.40) Key count: 649368 Size (bytes): 128983232
  • In the above log entry, bucket ID is e9ff35e0-e817-5363-9d1c-eefe85926cab.32068365.1
  • Get the bucket name from the metadata instance list and confirm that it is not listed under the metadata list or bucket stats rather listed under the stale instances list.

    1. Get the bucket name

      $ radosgw-admin metadata list --metadata-key bucket.instance | grep -i "bucket-id"
      
    2. Match the bucket ID with the live or active buckets list

      $ radosgw-admin metadata list --metadata-key bucket | grep -i "bucket-name"
      

      or

      $ radosgw-admin bucket stats | grep -i "bucket-name"
      
    3. At the end, match the bucket ID with the stale bucket instances list

      $ radosgw-admin reshard stale-instances list --yes-i-really-mean-it | grep -i "bucket-id"
      
  • Check the bucket and cluster sync status for the multisite sync

    1. Cluster sync status

      $ radosgw-admin sync status
      
    2. Bucket sync status

      $ radosgw-admin bucket sync status --bucket="bucket-name" --bucket-id="bucket-id"
      
  • Check the bilog entries for the bucket showing large omap warning

$ radosgw-admin bilog list --bucket-id="bucket-id" --bucket-name="bucket-name" --max-entries=600000 | grep -c op_id

or

$ rados -p zz1.rgw.index listomapkeys .dir.e9ff35e0-e817-5363-9d1c-eefe85926cab.32068365.1.1  | grep -i "0_000" | wc -l

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments