10.11. Troubleshooting Geo-replication

This section describes the most common troubleshooting scenarios related to geo-replication.

10.11.1. Tuning Geo-replication performance with Change Log

There are options for the change log that can be configured to give better performance in a geo-replication environment.
The rollover-time option sets the rate at which the change log is consumed. The default rollover time is 60 seconds, but it can be configured to a faster rate. A recommended rollover-time for geo-replication is 10-15 seconds. To change the rollover-time option, use following the command:
# gluster volume set VOLNAME rollover-time 15
The fsync-interval option determines the frequency that updates to the change log are written to disk. The default interval is 0, which means that updates to the change log are written synchronously as they occur, and this may negatively impact performance in a geo-replication environment. Configuring fsync-interval to a non-zero value will write updates to disk asynchronously at the specified interval. To change the fsync-interval option, use following the command:
# gluster volume set VOLNAME fsync-interval 3

10.11.2. Triggering Explicit Sync on Entries

Geo-replication provides an option to explicitly trigger the sync operation of files and directories. A virtual extended attribute glusterfs.geo-rep.trigger-sync is provided to accomplish the same.
# setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path>
The support of explicit trigger of sync is supported only for directories and regular files.

10.11.3. Synchronization Is Not Complete

Situation

The geo-replication status is displayed as Stable, but the data has not been completely synchronized.

Solution

A full synchronization of the data can be performed by erasing the index and restarting geo-replication. After restarting geo-replication, it will begin a synchronization of the data using checksums. This may be a long and resource intensive process on large data sets. If the issue persists, contact Red Hat Support.

For more information about erasing the index, see Section 11.1, “Configuring Volume Options”.

10.11.4. Issues with File Synchronization

Situation

The geo-replication status is displayed as Stable, but only directories and symlinks are synchronized. Error messages similar to the following are in the logs:

[2011-05-02 13:42:13.467644] E [master:288:regjob] GMaster: failed to sync ./some_file`
Solution

Geo-replication requires rsync v3.0.0 or higher on the host and the remote machines. Verify if you have installed the required version of rsync.

10.11.5. Geo-replication Status is Often Faulty

Situation

The geo-replication status is often displayed as Faulty, with a backtrace similar to the following:

012-09-28 14:06:18.378859] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twraptf(*aa) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen rid, exc, res = recv(self.inf) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv return pickle.load(inf) EOFError
Solution

This usually indicates that RPC communication between the master gsyncd module and slave gsyncd module is broken. Make sure that the following pre-requisites are met:

  • Passwordless SSH is set up properly between the host and remote machines.
  • FUSE is installed on the machines. The geo-replication module mounts Red Hat Gluster Storage volumes using FUSE to sync data.

10.11.6. Intermediate Master is in a Faulty State

Situation

In a cascading environment, the intermediate master is in a faulty state, and messages similar to the following are in the log:

raise RuntimeError ("aborting on uuid change from %s to %s" % \ RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f- 4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154
Solution

In a cascading configuration, an intermediate master is loyal to its original primary master. The above log message indicates that the geo-replication module has detected that the primary master has changed. If this change was deliberate, delete the volume-id configuration option in the session that was initiated from the intermediate master.

10.11.7. Remote gsyncd Not Found

Situation

The master is in a faulty state, and messages similar to the following are in the log:

[2012-04-04 03:41:40.324496] E [resource:169:errfail] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
Solution

The steps to configure a SSH connection for geo-replication have been updated. Use the steps as described in Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”