How does a large number of objects in a RHCS/Ceph pool affect the filestore merge and split rate, as well as the cluster performance?
Issue
-
A large number of slow requests and high disk utilisation while reading/writing data to pools in the Ceph cluster.
-
Several OSDs were dying with either
failedorwrongly marked me downmessages, as well as with heartbeat related stack traces. -
Slow requests evenly distributed throughout all the SATA disks in the cluster. This indicates the problem is cluster-wide and not isolated to any specific disk, host, or rack.
-
Listing the PG directory took well over 30 seconds to complete due to the high number of files.
Environment
- Red Hat Ceph Storage
- Ceph Cluster with Filestore OSDs
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.