OSDs flapping along with slow requests when applying a test load in a Ceph cluster.

Solution In Progress - Updated 2024-08-02T05:23:47+00:00 -

Issue

While load testing a Ceph cluster using cosbench, the OSDs time out or log wrongly marked me down messages.
The tests are done on the pools used for RGW.
The benchmark is being executed on the RGW pool consisting of 8TB SATA disks.
This started recently and the only change that has happened is the addition of objects in the pools, but not a dramatic increase as well.
The slowness is not evident when the files are copied to a new folder on the same disk, but the problem comes up if the files are copied with their extended attributes (xattr).
Some base line testing:

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.54084 s, 142 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.52354 s, 194 MB/s                                          

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.38176 s, 145 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.21124 s, 206 MB/s                                          

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.49258 s, 143 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.1737 s, 208 MB/s

Environment

Red Hat Ceph Storage 1.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

OSDs flapping along with slow requests when applying a test load in a Ceph cluster.

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links