Ceph: Multi site replication is slower than expected, FIFO push destination issue
Issue
Multi site replication is slower than expected, FIFO push destination issue
Issue #1:
In RHCS 5.0 through RHCS 5.2.x, the processing of an internal messaging queue violates FIFO's basic semantics. The result can be observed as slow multi site replication or, for a system which is closely monitored, as a multi site replication stall. After some time, multi site replication will recover and resume syncing to the remote Ceph cluster
Issue #2:
Prior to RHCS 5.0 the Ceph Cluster did not use a FIFO to process the Multi Site Replication requests. Instead, an OMAP was used and this had its own list of issues which resulted in slow Multi Site Replication.
Issue #3:
In RHCS 5.2.x and lower, there would be little to no parallel Multi Site Replication regardless of the number of Rados Gateways (RGWs) in use.
Environment
Red Hat Ceph Storage (RHCS) 3.x
Red Hat Ceph Storage (RHCS) 4.x
Red Hat Ceph Storage (RHCS) 5.0.x
Red Hat Ceph Storage (RHCS) 5.1.x
Red Hat Ceph Storage (RHCS) 5.2.x
Red Hat Ceph Storage (RHCS) Rados Gateway
Red Hat Ceph Storage (RHCS) RGW
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.