OSDs flapping along with slow requests when applying a test load in a Ceph cluster.

Solution In Progress - Updated -

Issue

  • While load testing a Ceph cluster using cosbench, the OSDs time out or log wrongly marked me down messages.

  • The tests are done on the pools used for RGW.

  • The benchmark is being executed on the RGW pool consisting of 8TB SATA disks.

  • This started recently and the only change that has happened is the addition of objects in the pools, but not a dramatic increase as well.

  • The slowness is not evident when the files are copied to a new folder on the same disk, but the problem comes up if the files are copied with their extended attributes (xattr).

  • Some base line testing:

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.54084 s, 142 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.52354 s, 194 MB/s                                          

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.38176 s, 145 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.21124 s, 206 MB/s                                          

# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here                                                    
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 7.49258 s, 143 MB/s                                          
256+0 records in                                                                               
256+0 records out                                                                              
1073741824 bytes (1.1 GB) copied, 5.1737 s, 208 MB/s

Environment

  • Red Hat Ceph Storage 1.3

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content