Ceph RGW and RBD latency get large after running the Ceph cluster for several months
Issue
- Ceph RGW is slow and its latency get large
- Ceph RBD latency get large and sometimes spike to over 100 ms
- The latency was small just after the initial deployment of the Ceph cluster.
The latency get worse after running the Ceph cluster for several months or longer. - In the worst case, some OSD are flapping with
slow requestswarning messages.
Environment
- Red Hat Ceph Storage 5
- Red Hat Ceph Storage 6
- BlueStore OSD on NVMe SSD device
- Each OSD is a primary device :
block(data area),block.db(RocksDB area) andblock.wal(WAL area) are collocated in the same NVMe SSD device - Each NVMe SSD is very large, e.g. 1TB or more
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.