Red Hat Training

A Red Hat training course is available for Red Hat Gluster Storage

10.4. Starting Geo-replication

This section describes how to and start geo-replication in your storage environment, and verify that it is functioning correctly.

10.4.1. Starting a Geo-replication Session

Important

You must create the geo-replication session before starting geo-replication. For more information, see Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”.
To start geo-replication, use one of the following commands:
  • To start the geo-replication session between the hosts:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
    For example:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol start
    Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
    This command will start distributed geo-replication on all the nodes that are part of the master volume. If a node that is part of the master volume is down, the command will still be successful. In a replica pair, the geo-replication session will be active on any of the replica nodes, but remain passive on the others.
    After executing the command, it may take a few minutes for the session to initialize and become stable.

    Note

    If you attempt to create a geo-replication session and the slave already has data, the following error message will be displayed:
    slave-node::slave is not empty. Please delete existing files in slave-node::slave and retry, or use force to continue without deleting the existing files. geo-replication command failed
  • To start the geo-replication session forcefully between the hosts:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force
    For example:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol start force
    Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
    This command will force start geo-replication sessions on the nodes that are part of the master volume. If it is unable to successfully start the geo-replication session on any node which is online and part of the master volume, the command will still start the geo-replication sessions on as many nodes as it can. This command can also be used to re-start geo-replication sessions on the nodes where the session has died, or has not started.

10.4.2. Verifying a Successful Geo-replication Deployment

You can use the status command to verify the status of geo-replication in your environment:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol status

10.4.3. Displaying Geo-replication Status Information

The status command can be used to display information about a specific geo-replication master session, master-slave session, or all geo-replication sessions. The status output provides both node and brick level information.
  • To display information about all geo-replication sessions, use the following command:
    # gluster volume geo-replication status [detail]
  • To display information on all geo-replication sessions from a particular master volume, use the following command:
    # gluster volume geo-replication MASTER_VOL status [detail] 
  • To display information of a particular master-slave session, use the following command:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status [detail]

    Important

    There will be a mismatch between the outputs of the df command (including -h and -k) and inode of the master and slave volumes when the data is in full sync. This is due to the extra inode and size consumption by the changelog journaling data, which keeps track of the changes done on the file system on the master volume. Instead of running the df command to verify the status of synchronization, use # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detail instead.
  • The geo-replication status command output provides the following information:
    • Master Node: Master node and Hostname as listed in the gluster volume info command output
    • Master Vol: Master volume name
    • Master Brick: The path of the brick
    • Slave User: Slave user name
    • Slave: Slave volume name
    • Slave Node: IP address/hostname of the slave node to which master worker is connected to.
    • Status: The status of the geo-replication worker can be one of the following:
      • Initializing: This is the initial phase of the Geo-replication session; it remains in this state for a minute in order to make sure no abnormalities are present.
      • Created: The geo-replication session is created, but not started.
      • Active: The gsync daemon in this node is active and syncing the data.
      • Passive: A replica pair of the active node. The data synchronization is handled by the active node. Hence, this node does not sync any data.
      • Faulty: The geo-replication session has experienced a problem, and the issue needs to be investigated further. For more information, see Section 10.11, “Troubleshooting Geo-replication” section.
      • Stopped: The geo-replication session has stopped, but has not been deleted.
    • Crawl Status: Crawl status can be one of the following:
      • Changelog Crawl: The changelog translator has produced the changelog and that is being consumed by gsyncd daemon to sync data.
      • Hybrid Crawl: The gsyncd daemon is crawling the glusterFS file system and generating pseudo changelog to sync data.
      • History Crawl: The gsyncd daemon consumes the history changelogs produced by the changelog translator to sync data.
    • Last Synced: The last synced time.
    • Entry: The number of pending entry (CREATE, MKDIR, RENAME, UNLINK etc) operations per session.
    • Data: The number of Data operations pending per session.
    • Meta: The number of Meta operations pending per session.
    • Failures: The number of failures. If the failure count is more than zero, view the log files for errors in the Master bricks.
    • Checkpoint Time: Displays the date and time of the checkpoint, if set. Otherwise, it displays as N/A.
    • Checkpoint Completed: Displays the status of the checkpoint.
    • Checkpoint Completion Time: Displays the completion time if Checkpoint is completed. Otherwise, it displays as N/A.

10.4.4. Configuring a Geo-replication Session

To configure a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config [Name] [Value]
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config use_tarssh true
For example, to view the list of all option/value pairs:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config
To delete a setting for a geo-replication config option, prefix the option with ! (exclamation mark). For example, to reset log-level to the default value:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config '!log-level'

Warning

You must ensure to perform these configuration changes when all the peers in cluster are in Connected (online) state. If you change the configuration when any of the peer is down, the geo-replication cluster would be in inconsistent state when the node comes back online.
Configurable Options

The following table provides an overview of the configurable options for a geo-replication setting:

Option Description
gluster-log-file LOGFILE The path to the geo-replication glusterfs log file.
gluster-log-level LOGFILELEVEL The log level for glusterfs processes.
log-file LOGFILE The path to the geo-replication log file.
log-level LOGFILELEVEL The log level for geo-replication.
changelog-log-level LOGFILELEVEL The log level for the changelog. The default log level is set to INFO.
ssh-command COMMAND The SSH command to connect to the remote machine (the default is SSH).
rsync-command COMMAND The rsync command to use for synchronizing the files (the default is rsync).
use-tarssh [true | false] The use-tarssh command allows tar over Secure Shell protocol. Use this option to handle workloads of files that have not undergone edits.
volume_id=UID The command to delete the existing master UID for the intermediate/slave node.
timeout SECONDS The timeout period in seconds.
sync-jobs N
The number of sync-jobs represents the maximum number of syncer threads (rsync processes or tar over ssh processes for syncing) inside each worker. The number of workers is always equal to the number of bricks in the Master volume. For example, a distributed-replicated volume of (3 x 2) with sync-jobs configured at 3 results in 9 total sync-jobs (aka threads) across all nodes/servers.
Active and Passive Workers: The number of active workers is based on the volume configuration. In case of a distribute volume, all bricks (workers) will be active and participate in syncing. In case of replicate or dispersed volume, one worker from each replicate/disperse group (subvolume) will be active and participate in syncing. This is to avoid duplicate syncing from other bricks. The remaining workers in each replicate/disperse group (subvolume) will be passive. In case the active worker goes down, one of the passive worker from the same replicate/disperse group will become an active worker.
ignore-deletes If this option is set to 1, a file deleted on the master will not trigger a delete operation on the slave. As a result, the slave will remain as a superset of the master and can be used to recover the master in the event of a crash and/or accidental delete.
checkpoint [LABEL|now] Sets a checkpoint with the given option LABEL. If the option is set as now, then the current time will be used as the label.
sync-acls [true | false]Syncs acls to the Slave cluster. By default, this option is enabled.

Note

Geo-replication can sync acls only with rsync as the sync engine and not with tarssh as the sync engine.
sync-xattrs [true | false]Syncs extended attributes to the Slave cluster. By default, this option is enabled.

Note

Geo-replication can sync extended attributes only with rsync as the sync engine and not with tarssh as the sync engine.
log-rsync-performance [true | false]If this option is set to enable, geo-replication starts recording the rsync performance in log files. By default, this option is disabled.
rsync-optionsAdditional options to rsync. For example, you can limit the rsync bandwidth usage "--bwlimit=<value>".
use-meta-volume [true | false]Set this option to enable, to use meta volume in Geo-replicaiton. By default, this option is disabled.

Note

More more information on meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”.
meta-volume-mnt PATHThe path of the meta volume mount point.

10.4.4.1. Geo-replication Checkpoints

10.4.4.1.1. About Geo-replication Checkpoints
Geo-replication data synchronization is an asynchronous process, so changes made on the master may take time to be replicated to the slaves. Data replication to a slave may also be interrupted by various issues, such network outages.
Red Hat Gluster Storage provides the ability to set geo-replication checkpoints. By setting a checkpoint, synchronization information is available on whether the data that was on the master at that point in time has been replicated to the slaves.
10.4.4.1.2. Configuring and Viewing Geo-replication Checkpoint Information
  • To set a checkpoint on a geo-replication session, use the following command:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config checkpoint [now|LABEL]
    For example, to set checkpoint between Volume1 and storage.backup.com:/data/remote_dir:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol config checkpoint now
    geo-replication config updated successfully
    The label for a checkpoint can be set as the current time using now, or a particular label can be specified, as shown below:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol config checkpoint NEW_ACCOUNTS_CREATED
    geo-replication config updated successfully.
  • To display the status of a checkpoint for a geo-replication session, use the following command:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detail
  • To delete checkpoints for a geo-replication session, use the following command:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config '!checkpoint'
    For example, to delete the checkpoint set between Volume1 and storage.backup.com::slave-vol:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol config '!checkpoint'
    geo-replication config updated successfully

10.4.5. Stopping a Geo-replication Session

To stop a geo-replication session, use one of the following commands:
  • To stop a geo-replication session between the hosts:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
    For example:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol stop
    Stopping geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful

    Note

    The stop command will fail if:
    • any node that is a part of the volume is offline.
    • if it is unable to stop the geo-replication session on any particular node.
    • if the geo-replication session between the master and slave is not active.
  • To stop a geo-replication session forcefully between the hosts:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force
    For example:
    # gluster volume geo-replication Volume1 storage.backup.com::slave-vol stop force
    Stopping geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
    Using force will stop the geo-replication session between the master and slave even if any node that is a part of the volume is offline. If it is unable to stop the geo-replication session on any particular node, the command will still stop the geo-replication sessions on as many nodes as it can. Using force will also stop inactive geo-replication sessions.

10.4.6. Deleting a Geo-replication Session

Important

You must first stop a geo-replication session before it can be deleted. For more information, see Section 10.4.5, “Stopping a Geo-replication Session”.
To delete a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL delete [reset-sync-time]
reset-sync-time: The geo-replication delete command retains the information about the last synchronized time. Due to this, if the same geo-replication session is recreated, then the synchronization will continue from the time where it was left before deleting the session. For the geo-replication session to not maintain any details about the deleted session, use the reset-sync-time option with the delete command. Now, when the session is recreated, it starts synchronization from the beginning just like a new session.
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol delete
geo-replication command executed successfully

Note

The delete command will fail if:
  • any node that is a part of the volume is offline.
  • if it is unable to delete the geo-replication session on any particular node.
  • if the geo-replication session between the master and slave is still active.

Important

The SSH keys will not removed from the master and slave nodes when the geo-replication session is deleted. You can manually remove the pem files which contain the SSH keys from the /var/lib/glusterd/geo-replication/ directory.