10.7. Disaster Recovery

Red Hat Gluster Storage provides geo-replication failover and failback capabilities for disaster recovery. If the master goes offline, you can perform a failover procedure so that a slave can replace the master. When this happens, all the I/O operations, including reads and writes, are done on the slave which is now acting as the master. When the original master is back online, you can perform a failback procedure on the original slave so that it synchronizes the differences back to the original master.

10.7.1. Failover: Promoting a Slave to Master

If the master volume goes offline, you can promote a slave volume to be the master, and start using that volume for data access.
Run the following commands on the slave machine to promote it to be the master:
# gluster volume set VOLNAME geo-replication.indexing on
# gluster volume set VOLNAME changelog on
For example
# gluster volume set slave-vol geo-replication.indexing on
volume set: success
# gluster volume set slave-vol changelog on
volume set: success
You can now configure applications to use the slave volume for I/O operations.

10.7.2.  Failback: Resuming Master and Slave back to their Original State

When the original master is back online, you can perform the following procedure on the original slave so that it synchronizes the differences back to the original master:
  1. Stop the existing geo-rep session from original master to orginal slave using the following command:
    # gluster volume geo-replication  ORIGINAL_MASTER_VOL ORIGINAL_SLAVE_HOST::ORIGINAL_SLAVE_VOL stop force
    For example,
    # gluster volume geo-replication  Volume1 example.com::slave-vol stop force
    Stopping geo-replication session between Volume1 and example.com::slave-vol has been successful
  2. Create a new geo-replication session with the original slave as the new master, and the original master as the new slave with force option. Detailed information on creating geo-replication session is available at: .
  3. Start the special synchronization mode to speed up the recovery of data from slave. This option adds capability to geo-replication to ignore the files created before enabling indexing option. With this option, geo-replication will synchronize only those files which are created after making Slave volume as Master volume.
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config special-sync-mode recover
    For example,
    # gluster volume geo-replication  slave-vol master.com::Volume1 config special-sync-mode recover
    geo-replication config updated successfully
    
  4. Start the new geo-replication session using the following command:
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL start
    For example,
    # gluster volume geo-replication slave-vol master.com::Volume1 start
    Starting geo-replication session between slave-vol and master.com::Volume1 has been successful
    
  5. Stop the I/O operations on the original slave and set the checkpoint. By setting a checkpoint, synchronization information is available on whether the data that was on the master at that point in time has been replicated to the slaves.
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config checkpoint now
    For example,
    # gluster volume geo-replication slave-vol master.com::Volume1 config checkpoint now
    geo-replication config updated successfully
    
  6. Checkpoint completion ensures that the data from the original slave is restored back to the original master. But since the IOs were stopped at slave before checkpoint was set, we need to touch the slave mount for checkpoint to be completed
    # touch orginial_slave_mount 
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL status detail
    For example,
    # touch /mnt/gluster/slavevol
    # gluster volume geo-replication slave-vol master.com::Volume1 status detail
    
  7. After the checkpoint is complete, stop and delete the current geo-replication session between the original slave and original master
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL stop
    # gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL delete
    For example,
    # gluster volume geo-replication slave-vol master.com::Volume1 stop
    Stopping geo-replication session between slave-vol and master.com::Volume1 has been successful
    
    # gluster volume geo-replication slave-vol master.com::Volume1 delete
    geo-replication command executed successfully
    
  8. Reset the options that were set for promoting the slave volume as the master volume by running the following commands:
    # gluster volume reset ORIGINAL_SLAVE_VOL geo-replication.indexing force
    # gluster volume reset ORIGINAL_SLAVE_VOL changelog
    For example,
    # gluster volume reset slave-vol geo-replication.indexing force
    volume set: success
    
    # gluster volume reset slave-vol changelog
    volume set: success
    
  9. Resume the original roles by starting the geo-rep session from the original master using the following command:
    # gluster volume geo-replication ORIGINAL_MASTER_VOL ORIGINAL_SLAVE_HOST::ORIGINAL_SLAVE_VOL start 
    # gluster volume geo-replication Volume1 example.com::slave-vol start
    Starting geo-replication session between slave-vol  and master.com::Volume1 been successful