Chapter 14. Recovering from disaster

This chapter explains how to restore your cluster to a working state after a disk or server failure.

You must have configured disaster recovery options previously in order to use this chapter. See Configuring backup and recovery options for details.

14.1. Manually restoring data from a backup volume

This section covers how to restore data from a remote backup volume to a freshly installed replacement deployment of Red Hat Hyperconverged Infrastructure for Virtualization.

To do this, you must:

  1. Install and configure a replacement deployment according to the instructions in Deploying Red Hat Hyperconverged Infrastructure for Virtualization.

14.1.1. Restoring a volume from a geo-replicated backup

  1. Install and configure a replacement Hyperconverged Infrastructure deployment

    For instructions, refer to Deploying Red Hat Hyperconverged Infrastructure for Virtualization: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.6/html/deploying_red_hat_hyperconverged_infrastructure_for_virtualization/.

  2. Import the backup of the storage domain

    From the new Hyperconverged Infrastructure deployment, in the Administration Portal:

    1. Click StorageDomains.
    2. Click Import Domain. The Import Pre-Configured Domain window opens.
    3. In the Storage Type field, specify GlusterFS.
    4. In the Name field, specify a name for the new volume that will be created from the backup volume.
    5. In the Path field, specify the path to the backup volume.
    6. Click OK. The following warning appears, with any active data centers listed below:

      This operation might be unrecoverable and destructive!
      
      Storage Domain(s) are already attached to a Data Center.
      Approving this operation might cause data corruption if
      both Data Centers are active.
    7. Check the Approve operation checkbox and click OK.
  3. Determine a list of virtual machines to import

    1. Determine the imported domain’s identifier by running the following command:

      # curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" https://$ENGINE_FQDN/ovirt-engine/api/storagedomains/

      For example:

      # curl -v -k -X GET -u "admin@example.com:mybadpassword" -H "Accept: application/xml" https://10.0.2.1/ovirt-engine/api/storagedomains/
    2. Determine the list of unregistered disks by running the following command:

      # curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" "https://$ENGINE_FQDN/ovirt-engine/api/storagedomains/DOMAIN_ID/vms;unregistered"

      For example:

      # curl -v -k -X GET -u "admin@example.com:mybadpassword" -H "Accept: application/xml" "https://10.0.2.1/ovirt-engine/api/storagedomains/5e1a37cf-933d-424c-8e3d-eb9e40b690a7/vms;unregistered"
  4. Perform a partial import of each virtual machine to the storage domain

    1. Determine cluster identifier

      The following command returns the cluster identifier.

      # curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" https://$ENGINE_FQDN/ovirt-engine/api/clusters/

      For example:

      # curl -v -k -X GET -u "admin@example:mybadpassword" -H "Accept: application/xml" https://10.0.2.1/ovirt-engine/api/clusters/
    2. Import the virtual machines

      The following command imports a virtual machine without requiring all disks to be available in the storage domain.

      # curl -v -k -u 'admin@internal:password' -H "Content-type: application/xml" -d '<action> <cluster id="CLUSTER_ID"></cluster> <allow_partial_import>true</allow_partial_import> </action>' "https://ENGINE_FQDN/ovirt-engine/api/storagedomains/DOMAIN_ID/vms/VM_ID/register"

      For example:

      # curl -v -k -u 'admin@example.com:mybadpassword' -H "Content-type: application/xml" -d '<action> <cluster id="bf5a9e9e-5b52-4b0d-aeba-4ee4493f1072"></cluster> <allow_partial_import>true</allow_partial_import> </action>' "https://10.0.2.1/ovirt-engine/api/storagedomains/8d21980a-a50b-45e9-9f32-cd8d2424882e/e164f8c6-769a-4cbd-ac2a-ef322c2c5f30/register"

      For further information, see the Red Hat Virtualization REST API Guide: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/rest_api_guide/.

  5. Migrate the partially imported disks to the new storage domain

    In the Administration Portal, click StorageDisks, and Click the Move Disk option. Move the imported disks from the synced volume to the replacement cluster’s storage domain. For further information, see the Red Hat Virtualization Administration Guide.

  6. Attach the restored disks to the new virtual machines

    Follow the instructions in the Red Hat Virtualization Virtual Machine Management Guide to attach the replacement disks to each virtual machine.

14.2. Failing over to a secondary cluster

This section covers how to fail over from your primary cluster to a remote secondary cluster in the event of server failure.

  1. Configure failover to a remote cluster.
  2. Verify that the mapping file for the source and target clusters remains accurate.
  3. Run the failover playbook with the fail_over tag.

    # ansible-playbook dr-rhv-failover.yml --tags "fail_over"

14.3. Failing back to a primary cluster

This section covers how to fail back from your secondary cluster to the primary cluster after you have corrected the cause of a server failure.

  1. Prepare the primary cluster for failback by running the cleanup playbook with the clean_engine tag.

    # ansible-playbook dr-cleanup.yml --tags "clean_engine"
  2. Verify that the mapping file for the source and target clusters remains accurate.
  3. Execute failback by running the failback playbook with the fail_back tag.

    # ansible-playbook dr-cleanup.yml --tags "fail_back"

14.4. Stopping a geo-replication session using RHV Manager

Stop a geo-replication session when you want to prevent data being replicated from an active source volume to a passive target volume via geo-replication.

  1. Verify that data is not currently being synchronized

    Click the Tasks icon at the top right of the Manager, and review the Tasks page.

    Ensure that there are no ongoing tasks related to Data Synchronization.

    If data synchronization tasks are present, wait until they are complete.

  2. Stop the geo-replication session

    1. Click StorageVolumes.
    2. Click the name of the volume that you want to prevent geo-replicating.
    3. Click the Geo-replication subtab.
    4. Select the session that you want to stop, then click Stop.

14.5. Turning off scheduled backups by deleting the geo-replication schedule

You can stop scheduled backups via geo-replication by deleting the geo-replication schedule.

  1. Log in to the Administration Portal on any source node.
  2. Click StorageDomains.
  3. Click the name of the storage domain that you want to back up.
  4. Click the Remote Data Sync Setup subtab.
  5. Click Setup.

    The Setup Remote Data Synchronization window opens.

  6. In the Recurrence field, select a recurrence interval type of NONE and click OK.
  7. (Optional) Remove the geo-replication session

    Run the following command from the geo-replication master node:

    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL delete

    You can also run this command with the reset-sync-time parameter. For further information about this parameter and deleting a geo-replication session, see Deleting a Geo-replication Session in the Red Hat Gluster Storage 3.4 Administration Guide.