Chapter 1. Introduction to Regional-DR

Disaster recovery is the ability to recover and continue business critical applications from natural or human created disasters. It is the overall business continuance strategy of any major organization as designed to preserve the continuity of business operations during major adverse events.

Regional-DR capability provides volume persistent data and metadata replication across sites that are geographically dispersed. In the public cloud these would be similar to protecting from a region failure. Regional-DR ensures business continuity during the unavailability of a geographical region, accepting some loss of data in a predictable amount. This is usually expressed at Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

  • RPO is a measure of how frequently you take backups or snapshots of persistent data. In practice, the RPO indicates the amount of data that will be lost or need to be reentered after an outage.
  • RTO is the amount of downtime a business can tolerate. The RTO answers the question, “How long can it take for our system to recover after we were notified of a business disruption?”

The intent of this guide is to detail the steps and commands necessary for configuring your infrastructure for enabling disaster recovery.

1.1. Components of Regional-DR solution

Regional-DR is composed of Red Hat Advanced Cluster Management for Kubernetes (RHACM) and OpenShift Data Foundation components to provide application and data mobility across OpenShift Container Platform clusters.

Red Hat Advanced Cluster Management for Kubernetes

Red Hat Advanced Cluster Management provides the ability to manage multiple clusters and application lifecycles. Hence, it serves as a control plane in a multi-cluster environment.

RHACM is split into two parts:

  • RHACM Hub: includes component that run on the multi-cluster control plane.
  • Managed clusters: includes components that run on the clusters that are managed.

For more information about this product, see RHACM documentation and the RHACM “Managing Applications” documentation.

OpenShift Data Foundation

OpenShift Data Foundation provides the ability to provision and manage storage for stateful applications in an OpenShift Container Platform cluster.

OpenShift Data Foundation is backed by Ceph as the storage provider, whose lifecycle is managed by Rook in the OpenShift Data Foundation component stack. Ceph-CSI provides the provisioning and management of Persistent Volumes for stateful applications.

OpenShift Data Foundation stack is now enhanced with the following abilities:

  • Enable pools for mirroring
  • Automatically mirror images across RBD block pools
  • Provides csi-addons to manage per Persistent Volume Claim (PVC) mirroring

OpenShift DR

OpenShift DR is a disaster recovery orchestrator for stateful applications across a set of peer OpenShift clusters which are deployed and managed using RHACM and provides cloud-native interfaces to orchestrate the life-cycle of an application’s state on Persistent Volumes. These include:

  • Protecting an application state relationship across OpenShift clusters
  • Failing over an application’s state to a peer cluster
  • Relocate an application’s state to the previously deployed cluster

OpenShift DR is split into three components:

  • ODF Multicluster Orchestrator: Installed on the multi-cluster control plane (RHACM Hub), it also performs the following actions:

    • Creates a bootstrap token and exchanges this token between the managed clusters.
    • Enables mirroring for the default CephBlockPool on the managed clusters.
    • Creates an object bucket using Multicloud Object Gateway (MCG) on each managed cluster for PVC and PV metadata.
    • Creates a Secret for each new object bucket that has the keys for bucket access on the Hub cluster in the openshift-dr-system project.
    • Creates a VolumeReplicationClass on the Primary managed cluster and the Secondary managed cluster for each schedulingIntervals (e.g. 5m, 15m, 30m).
    • Modifies the ramen-hub-operator-config ConfigMap on the Hub cluster and adds the s3StoreProfiles entries.
  • OpenShift DR Hub Operator: Installed on the hub cluster to manage failover and relocation for applications.
  • OpenShift DR Cluster Operator: Installed on each managed cluster to manage the lifecycle of all PVCs of an application.

1.2. Regional-DR deployment workflow

This section provides an overview of the steps required to configure and deploy Regional-DR capabilities using OpenShift Data Foundation version 4.10 and RHACM latest version across two distinct OpenShift Container Platform clusters. In addition to two managed clusters, a third OpenShift Container Platform cluster will be required to deploy the Advanced Cluster Management.

To configure your infrastructure, perform the below steps in the order given:

  1. Ensure you meet each of the Regional-DR requirements which includes RHACM operator installation, creation or importing of OpenShift Container Platform into RHACM hub and network configuration. See Requirements for enabling Regional-DR.
  2. Install OpenShift Data Foundation 4.10 on Primary and Secondary managed clusters. See Installing OpenShift Data Foundation on managed clusters.
  3. Install the Openshift DR Hub Operator on the Hub cluster. See Installing OpenShift DR Hub Operator on Hub cluster.
  4. Configure multisite storage replication by creating the mirroring relationship between two OpenShift Data Foundation managed clusters. See Configuring multisite storage replication.
  5. Create a mirroring StorageClass resource on each managed cluster that supports new imageFeatures for block volumes that have mirroring enabled. See Creating mirroring StorageClass resource.
  6. Create the DRPolicy resource on the hub cluster which is used to deploy, failover, and relocate the workloads across managed clusters. See Creating Disaster Recovery Policy on Hub cluster.

    Note

    There can be more than a single policy.

  7. Enable automatic installation of the OpenShift DR Cluster operator and automatic transfer of S3 secrets on the managed clusters. For instructions, see Enabling automatic install of OpenShift DR cluster operator and Enabling automatic transfer of S3 secrets on managed clusters.
  8. Create a sample application using RHACM console for testing failover and relocation testing. For instructions, see Creating sample application, application failover and relocating an application between managed clusters.