Chapter 3. Configuring data backup and recovery options

This chapter explains how to add disaster recovery capabilities to your Red Hat Hyperconverged Infrastructure for Virtualization deployment so that you can restore your cluster to a working state after a disk or server failure.

3.1. Prerequisites

3.1.1. Prerequisites for geo-replication

Be aware of the following requirements and limitations when configuring geo-replication:

Two different managers required
The source and destination volumes for geo-replication must be managed by different instances of Red Hat Virtualization Manager.

3.1.2. Prerequisites for failover and failback configuration

Versions must match between environments
Ensure that the primary and secondary environments have the same version of Red Hat Virtualization Manager, with identical data center compatibility versions, cluster compatibility versions, and PostgreSQL versions.
No virtual machine disks in the hosted engine storage domain
The storage domain used by the hosted engine virtual machine is not failed over, so any virtual machine disks in this storage domain will be lost.
Execute Ansible playbooks manually from a separate machine

Generate and execute Ansible playbooks manually from a separate machine that acts as an Ansible controller node. This node must have the ovirt-ansible-collection package, which provides all required disaster recovery Ansible roles.

Note

The ovirt-ansible-collection package is installed with the Hosted Engine virtual machine by default. However, during a disaster that affects the primary site, this virtual machine may be down. It is safe to use a machine that is outside the primary site to run this playbook, but for testing purposes these playbooks can be triggered from the Hosted Engine virtual machine.

3.2. Supported backup and recovery configurations

There are two supported ways to add disaster recovery capabilities to your Red Hat Hyperconverged Infrastructure for Virtualization deployment.

Configure backing up to a secondary volume only

Regularly synchronizing your data to a remote secondary volume helps to ensure that your data is not lost in the event of disk or server failure.

This option is suitable if the following statements are true of your deployment.

  • You require only a backup of your data for disaster recovery.
  • You do not require highly available storage.
  • You do not want to maintain a secondary cluster.
  • You are willing to manually restore your data and reconfigure your backup solution after a failure has occurred.

Follow the instructions in Configuring backup to a secondary volume to configure this option.

Configure failing over to and failing back from a secondary cluster

This option provides failover and failback capabilities in addition to backing up data on a remote volume. Configuring failover of your primary cluster’s operations and storage domains to a secondary cluster helps to ensure that your data remains available in event of disk or server failure in the primary cluster.

This option is suitable if the following statements are true of your deployment.

  • You require highly available storage.
  • You are willing to maintain a secondary cluster.
  • You do not want to manually restore your data or reconfigure your backup solution after a failure has occurred.

Follow the instructions in Configuring failover to and failback from a secondary cluster to configure this option.

Red Hat recommends that you configure at least a backup volume for production deployments.

3.3. Configuring backup to a secondary volume

This section covers how to back up a gluster volume to a secondary gluster volume using geo-replication.

To do this, you must:

  1. Ensure that all prerequisites are met.
  2. Create a suitable volume to use as a geo-replication target.
  3. Configure a geo-replication session between the source volume and the target volume.
  4. Schedule the geo-replication process.

3.3.1. Prerequisites

3.3.1.1. Enable shared storage on the source volume

Ensure that the volume you want to back up (the source volume) has shared storage enabled. Run the following command on any server that hosts the source volume to enable shared storage.

# gluster volume set all cluster.enable-shared-storage enable

Ensure that a gluster volume named gluster_shared_storage is created in the source cluster, and is mounted at /run/gluster/shared_storage on all the nodes in the source cluster. See Setting Up Shared Storage for further information.

3.3.1.2. Match network protocol

Ensure that all hosts use the same Internet Protocol version.

If the hosts for your source volume use IPv4, the hosts for the target volume must also use IPv4.

If the hosts for your source volume use IPv6, the hosts for the target volume must use IPv6. Additionally, configure geo-replication using FQDNs instead of IPv6 addresses to avoid Bug 1855965.

3.3.2. Create a suitable target volume for geo-replication

Prepare a secondary gluster volume to hold the geo-replicated copy of your source volume. This target volume should be in a separate cluster, hosted at a separate site, so that the risk of source and target volumes being affected by the same outages is minimised.

Ensure that the target volume for geo-replication has sharding enabled. Run the following command on any node that hosts the target volume to enable sharding on that volume.

# gluster volume set <volname> features.shard enable

3.3.3. Configuring geo-replication for backing up volumes

3.3.3.1. Creating a geo-replication session

A geo-replication session is required to replicate data from an active source volume to a passive target volume.

Important

Only rsync based geo-replication is supported with Red Hat Hyperconverged Infrastructure for Virtualization.

  1. Create a common pem pub file.

    Run the following command on a source node that has key-based SSH authentication without a password configured to the target nodes.

    # gluster system:: execute gsec_create
  2. Create the geo-replication session

    Run the following command to create a geo-replication session between the source and target volumes, using the created pem pub file for authentication.

    # gluster volume geo-replication <SOURCE_VOL> <TARGET_NODE>::<TARGET_VOL> create push-pem

    For example, the following command creates a geo-replication session from a source volume prodvol to a target volume called backupvol, which is hosted by backup.example.com.

    # gluster volume geo-replication prodvol backup.example.com::backupvol create push-pem

    By default this command verifies that the target volume is a valid target with available space. You can append the force option to the command to ignore failed verification.

  3. Configure a meta-volume

    This relies on the source volume having shared storage configured, as described in Prerequisites.

    # gluster volume geo-replication <SOURCE_VOL> <TARGET_HOST>::<TARGET_VOL> config use_meta_volume true
Important

Do not start the geo-replication session. Starting the geo-replication session begins replication from your source volume to your target volume.

3.3.3.2. Verifying creation of a geo-replication session

  1. Log in to the Administration Portal on any source node.
  2. Click StorageVolumes.
  3. Check the Info column for the geo-replication icon.

    If this icon is present, geo-replication has been configured for that volume.

    If this icon is not present, try synchronizing the volume.

3.3.3.3. Synchronizing volume state using the Administration Portal

  1. Log in to the Administration Portal.
  2. Click StorageVolumes.
  3. Select the volume that you want to synchronize.
  4. Click the Geo-replication sub-tab.
  5. Click Sync.

3.3.4. Scheduling regular backups using geo-replication

  1. Log in to the Administration Portal on any source node.
  2. Click StorageDomains.
  3. Click the name of the storage domain that you want to back up.
  4. Click the Remote Data Sync Setup subtab.
  5. Click Setup.

    The Setup Remote Data Synchronization window opens.

    1. In the Geo-replicated to field, select the backup target.
    2. In the Recurrence field, select a recurrence interval type.

      Valid values are WEEKLY with at least one weekday checkbox selected, or DAILY.

    3. In the Hours and Minutes field, specify the time to start synchronizing.

      Note

      This time is based on the Hosted Engine’s timezone.

    4. Click OK.
  6. Check the Events subtab for the source volume at the time you specified to verify that synchronization works correctly.

3.4. Configuring failover to and failback from a secondary cluster

This section covers how to configure your cluster to fail over to a remote secondary cluster in the event of server failure.

To do this, you must:

3.4.1. Creating a secondary cluster for failover

Install and configure a secondary cluster that can be used in place of the primary cluster in the event of failure.

This secondary cluster can be either of the following configurations:

Red Hat Hyperconverged Infrastructure
See Deploying Red Hat Hyperconverged Infrastructure for details.
Red Hat Gluster Storage configured for use as a Red Hat Virtualization storage domain
See Configuring Red Hat Virtualization with Red Hat Gluster Storage for details. Note that creating a storage domain is not necessary for this use case; the storage domain is imported as part of the failover process.

The storage on the secondary cluster must not be attached to a data center, so that it can be added to the secondary site’s data center during the failover process.

3.4.2. Creating a mapping file between source and target clusters

Follow this section to create a file that maps the storage in your source cluster to the storage in your target cluster.

Red Hat recommends that you create this file immediately after you first deploy your storage, and keep it up to date as your deployment changes. This helps to ensure that everything in your cluster fails over safely in the event of disaster.

  1. Create a playbook to generate the mapping file.

    Create a playbook that passes information about your cluster to the ovirt.ovirt.disaster_recovery role, using the site, username, password, and ca variables.

    Example playbook file: dr-ovirt-setup.yml

    ---
    - name: Collect mapping variables
      hosts: localhost
      connection: local
    
      vars:
        site: https://example.engine.redhat.com/ovirt-engine/api
        username: admin@internal
        password: my_password
        ca: /etc/pki/ovirt-engine/ca.pem
        var_file: disaster_recovery_vars.yml
    
      roles:
        - ovirt.ovirt.disaster_recovery

  2. Generate the mapping file by running the playbook with the generate_mapping tag.

    # ansible-playbook dr-ovirt-setup.yml --tags="generate_mapping"

    This creates the mapping file, disaster_recovery_vars.yml.

  3. Edit disaster_recovery_vars.yml and add information about the secondary cluster. Ensure that you only mention storage domains that have data synchronized to the secondary site; other storage domains can be removed.

    See Appendix A: Mapping File Attributes in the Red Hat Virtualization Disaster Recovery Guide for detailed information about attributes used in the mapping file.

3.4.3. Creating a failover playbook between source and target clusters

Create a playbook file to handle failover.

  1. Define a password file (for example passwords.yml) to store the Manager passwords for the primary and secondary site. For example:

    Example passwords.yml file

    ---
    # This file is in plain text, if you want to
    # encrypt this file, please execute following command:
    #
    # $ ansible-vault encrypt passwords.yml
    #
    # It will ask you for a password, which you must then pass to
    # ansible interactively when executing the playbook.
    #
    # $ ansible-playbook myplaybook.yml --ask-vault-pass
    #
    dr_sites_primary_password: primary_password
    dr_sites_secondary_password: secondary_password

    Note

    For extra security you can encrypt the password file. However, you will need to use the --ask-vault-pass parameter when running the playbook. See Working with files encrypted using Ansible Vault for more information.

  2. Create a playbook file that passes the lists of hyperconverged hosts to use as a failover source and target to the ovirt.ovirt.disaster_recovery role, using the dr_target_host and dr_source_map variables.

    Example playbook file: dr-rhv-failover.yml

    ---
    - name: Failover RHV
      hosts: localhost
      connection: local
      vars:
        dr_target_host: secondary
        dr_source_map: primary
      vars_files:
        - disaster_recovery_vars.yml
        - passwords.yml
      roles:
        - ovirt.ovirt.disaster_recovery

For information about executing failover, see Failing over to a secondary cluster.

3.4.4. Creating a failover cleanup playbook for your primary cluster

Create a playbook file that cleans up your primary cluster so that you can use it as a failback target.

Example playbook file: dr-cleanup.yml

---
- name: Clean RHV
  hosts: localhost
  connection: local
  vars:
    dr_source_map: primary
  vars_files:
    - disaster_recovery_vars.yml
  roles:
    - ovirt.ovirt.disaster_recovery

For information about executing failback, see Failing back to a primary cluster.

3.4.5. Create a failback playbook between source and target clusters

Create a playbook file that passes the lists of hyperconverged hosts to use as a failback source and target to the ovirt.ovirt.disaster_recovery role, using the dr_target_host and dr_source_map variables.

Example playbook file: dr-rhv-failback.yml

---
- name: Failback RHV
  hosts: localhost
  connection: local
  vars:
    dr_target_host: primary
    dr_source_map: secondary
  vars_files:
    - disaster_recovery_vars.yml
    - passwords.yml
  roles:
    - ovirt.ovirt.disaster_recovery

For information about executing failback, see Failing back to a primary cluster.