Ceph: OSD Slow Requests and OSDs flapping after deleting large RGW objects

Issue

OSD Slow Requests and OSDs flapping after deleting large RGW objects

After deleting a large amount of S3/Swift data in a short window of time, a Ceph Cluster may experience the following:

OSD slow op warnings
Laggy PGs
OSDs flapping
radosgw-admin gc list stalls.
The OSD logs will have these errors

TIMESTAMP THREAD_NAME /builddir/build/BUILD/ceph-16.2.0/src/cls/queue/cls_queue_src.cc:243: ERROR: No space left in queue
TIMESTAMP THREAD_NAME osd.927 260489 get health metrics reporting 223 slow ops, oldest is osd_op (client.64814084,0173667 5.bt 5:fde4dd55:gc::gc.261head [call version.check_conds in 74b, call rgw_gc.rgw_gc_queue_enqueue_in#653b] anape 011 RETRY-4 ondisk+retry+write+known if redirected e260479)

The issue is reproduced by creating 3 900 GB S3 objects and later deleting them. Some time after (minutes, couple of hours) the deletion, the above symptoms will be observed. Creating 3000 900 MB objects and later deleting them all at once would also trigger the same issue.

Environment

Red Hat Ceph Storage (RHCS) 4
Red Hat Ceph Storage (RHCS) 5
Red Hat Ceph Storage (RHCS) 6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Ceph: OSD Slow Requests and OSDs flapping after deleting large RGW objects

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links