How does a large number of objects in a RHCS/Ceph pool affect the filestore merge and split rate, as well as the cluster performance?

Solution Verified - Updated 2020-12-01T13:47:57+00:00 -

Issue

A large number of slow requests and high disk utilisation while reading/writing data to pools in the Ceph cluster.
Several OSDs were dying with either failed or wrongly marked me down messages, as well as with heartbeat related stack traces.
Slow requests evenly distributed throughout all the SATA disks in the cluster. This indicates the problem is cluster-wide and not isolated to any specific disk, host, or rack.
Listing the PG directory took well over 30 seconds to complete due to the high number of files.

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.