OSDs flapping along with slow requests when applying a test load in a Ceph cluster.
Issue
-
While load testing a Ceph cluster using
cosbench, the OSDs time out or logwrongly marked me downmessages. -
The tests are done on the pools used for RGW.
-
The benchmark is being executed on the RGW pool consisting of 8TB SATA disks.
-
This started recently and the only change that has happened is the addition of objects in the pools, but not a dramatic increase as well.
-
The slowness is not evident when the files are copied to a new folder on the same disk, but the problem comes up if the files are copied with their extended attributes (xattr).
-
Some base line testing:
# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 7.54084 s, 142 MB/s
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 5.52354 s, 194 MB/s
# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 7.38176 s, 145 MB/s
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 5.21124 s, 206 MB/s
# dd if=/dev/zero of=here bs=4M count=256 oflag=direct && dd if=here of=/dev/null bs=4M count=256 && rm -f here
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 7.49258 s, 143 MB/s
256+0 records in
256+0 records out
1073741824 bytes (1.1 GB) copied, 5.1737 s, 208 MB/s
Environment
- Red Hat Ceph Storage 1.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
