Ceph RGW and RBD latency get large after running the Ceph cluster for several months

Solution Verified - Updated -

Issue

  • Ceph RGW is slow and its latency get large
  • Ceph RBD latency get large and sometimes spike to over 100 ms
  • The latency was small just after the initial deployment of the Ceph cluster.
    The latency get worse after running the Ceph cluster for several months or longer.
  • In the worst case, some OSD are flapping with slow requests warning messages.

Environment

  • Red Hat Ceph Storage 5
  • Red Hat Ceph Storage 6
  • BlueStore OSD on NVMe SSD device
  • Each OSD is a primary device : block (data area), block.db (RocksDB area) and block.wal (WAL area) are collocated in the same NVMe SSD device
  • Each NVMe SSD is very large, e.g. 1TB or more

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content