Disaster Recovery for Ansible Automation Platform on Azure

Updated -

Overview

The Ansible Automation Platform on Azure can recover from a service-impacting event in an Azure region. This optional feature is enabled on the “Business Continuity” step during the installation of the managed application. Enabling this option incurs additional Azure infrastructure costs. Current customers can request this feature be enabled on their instance using a support help request.

The disaster recovery feature activates the replication of storage between a primary region and its assigned paired region. These paired data centers are located geographically distant to account for natural events. A list of Azure data center pairs can be found here: Azure Cross-Region Replication.

It should be noted that disaster recovery is not synonymous with high availability. A loss of service and data can occur when the primary region is impacted.

How does disaster recovery work?

A nightly backup of the managed application is placed on Azure storage for replication. This backup will be loaded into a new deployment of the Ansible Automation Platform in a non-impacted region. The amount of time required to recover an instance depends on the amount of data being recovered and the availability of Azure resources.

How does my application recover from an event?

The following steps should be taken if your managed application's region is experiencing a service-impacting event:

  1. Deploy a new instance of the managed application to a region of your choice. We recommend you using the region pair of your primary region. You must deploy the second instance of the managed application using the same Azure subscription as your primary instance.
  2. Contact Red Hat customer support indicating your managed application's region has failed and your managed application needs to be recovered. Provide the following information:
    • Name of the instance impacted
    • Name of the new instance
    • Azure Subscription ID
    • Contact information for rapid collaboration
  3. Red Hat Site Reliability Engineers will prioritize the recovery operation. The time required for a full recovery depends on the availability of Azure resources and the amount of data to recover.
  4. A Red Hat representative will contact you using the information supplied in your support request to indicate the process is complete. Priority will be given to ensure that any issues with the new instance will be addressed promptly.

How can disaster recovery be tested?

Red Hat encourages customers to periodically test disaster recovery procedures. This process can be scheduled by submitting a support request asking for a disaster recovery test, with a limit of one disaster recovery test every six months.

Comments