Red Hat Training

A Red Hat training course is available for Red Hat Gluster Storage

10.6. Disaster Recovery

When the master volume goes offline, you can perform the following disaster recovery procedures to minimize the interruption of service:

10.6.1. Promoting a Slave to Master

If the master volume goes offline, you can promote a slave volume to be the master, and start using that volume for data access.
Run the following commands on the slave machine to promote it to be the master:
# gluster volume set VOLNAME geo-replication.indexing on
# gluster volume set VOLNAME changelog on
You can now configure applications to use the slave volume for I/O operations.

10.6.2. Failover and Failback

Red Hat Storage provides geo-replication failover and failback capabilities for disaster recovery. If the master goes offline, you can perform a failover procedure so that a slave can replace the master. When this happens, all the I/O operations, including reads and writes, are done on the slave which is now acting as the master. When the original master is back online, you can perform a failback procedure on the original slave so that it synchronizes the differences back to the original master.

Performing a Failover and Failback

  1. Create a new geo-replication session with the original slave as the new master, and the original master as the new slave. For more information on setting and creating geo-replication session, see Section 10.3.4, “Setting Up your Environment for Geo-replication Session”.
  2. Start the special synchronization mode to speed up the recovery of data from slave.
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config special-sync-mode recover
  3. Set a checkpoint to help verify the status of the data synchronization.
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config checkpoint now
  4. Start the new geo-replication session using the following command:
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL start
  5. Monitor the checkpoint output using the following command, until the status displays: checkpoint as of <time of checkpoint creation> is completed at <time of completion>.
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL status
  6. To resume the original master and original slave back to their previous roles, stop the I/O operations on the original slave, and using steps 3 and 5, ensure that all the data from the original slave is restored back to the original master. After the data from the original slave is restored back to the original master, stop the current geo-replication session (the failover session) between the original slave and original master, and resume the previous roles.
  7. Reset the options that were set for promoting the slave volume as the master volume by running the following commands:
    # gluster volume reset ORIGINAL_SLAVE_VOL geo-replication.indexing  # gluster volume reset ORIGINAL_SLAVE_VOL changelog
    For more information on promoting slave volume to be the master volume, see Section 10.6.1, “Promoting a Slave to Master”.