Chapter 1. Introduction to Regional-DR

Disaster recovery is the ability to recover and continue business critical applications from natural or human created disasters. It is the overall business continuance strategy of any major organization as designed to preserve the continuity of business operations during major adverse events.

Regional-DR capability provides volume persistent data and metadata replication across sites that are geographically dispersed. In the public cloud these would be akin to protecting from a region failure. Regional-DR ensures business continuity during the unavailability of a geographical region, accepting some loss of data in a predictable amount. This is usually expressed at Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

  • RPO is a measure of how frequently you take backups or snapshots of persistent data. In practice, the RPO indicates the amount of data that will be lost or need to be reentered after an outage.
  • RTO is the amount of downtime a business can tolerate. The RTO answers the question, “How long can it take for our system to recover after we were notified of a business disruption?”

The intent of this guide is to detail the steps and commands necessary for configuring your infrastructure for enabling disaster recovery.

1.1. Components of Regional-DR solution

Regional-DR is composed of Red Hat Advanced Cluster Management for Kubernetes (RHACM) and OpenShift Data Foundation components to provide application and data mobility across OpenShift Container Platform clusters.

Red Hat Advanced Cluster Management for Kubernetes (RHACM)

RHACM provides the ability to manage multiple clusters and application lifecycles. Hence, it serves as a control plane in a multi-cluster environment.

RHACM is split into two parts:

  • RHACM Hub: components that run on the multi-cluster control plane
  • Managed clusters: components that run on the clusters that are managed

For more information about this product, see RHACM documentation and the RHACM “Managing Applications” documentation.

OpenShift Data Foundation

OpenShift Data Foundation provides the ability to provision and manage storage for stateful applications in an OpenShift Container Platform cluster.

OpenShift Data Foundation is backed by Ceph as the storage provider, whose lifecycle is managed by Rook in the OpenShift Data Foundation component stack. Ceph-CSI provides the provisioning and management of Persistent Volumes for stateful applications.

OpenShift Data Foundation stack is enhanced with the ability to:

  • Enable pools for mirroring
  • Automatically mirror images across RBD pools
  • Provides csi-addons to manage per Persistent Volume Claim mirroring

OpenShift DR

OpenShift DR is a disaster-recovery orchestrator for stateful applications across a set of peer OpenShift clusters which are deployed and managed using RHACM and provides cloud-native interfaces to orchestrate the life-cycle of an application’s state on Persistent Volumes. These include:

  • Protecting an application state relationship across OpenShift clusters
  • Failing over an application’s state to a peer cluster on unavailability of the currently deployed cluster
  • Relocate an application’s state to the previously deployed cluster

OpenShift DR is split into three components:

  • ODF Multicluster Orchestrator: Installed on the multi-cluster control plane (RHACM Hub), creates a bootstrap token and exchanges this token between the managed clusters.
  • OpenShift DR Hub Operator: Installed on the hub cluster to manage failover and relocation for applications.
  • OpenShift DR Cluster Operator: Installed on each managed cluster to manage the lifecycle of all PVCs of an application.

1.2. Regional-DR deployment workflow

This section provides an overview of the steps required to configure and deploy Regional-DR capabilities using OpenShift Data Foundation version 4.9 and RHACM version 2.4 across two distinct OpenShift Container Platform clusters. In addition to two managed clusters, a third OpenShift Container Platform cluster will be required to deploy the Advanced Cluster Management hub solution.

To configure your infrastructure, perform the below steps in the order given:

  1. Ensure you meet each of the Regional-DR requirements. See Requirements for enabling Regional-DR.
  2. Configure multisite storage replication by creating the mirroring relationship between two OpenShift Data Foundation managed clusters. See Configuring multisite storage replication.
  3. Create a VolumeReplicationClass resource on each managed cluster to configure the replication schedule (for example: replicate between peers every 5 minutes). See Creating VolumeReplicationClass resource.
  4. Create a mirroring StorageClass resource on each managed cluster that supports new imageFeatures for block volumes that have mirroring enabled. See Creating mirroring StorageClass resource.
  5. Install the OpenShift DR Cluster Operator on the managed clusters and create the required object buckets, secrets and configmaps. See Installing OpenShift DR Cluster Operator on Managed clusters.
  6. Install the OpenShift DR Hub Operator on the Hub cluster and create the required object buckets, secrets and configmap. See Installing OpenShift DR Hub Operator on Hub cluster.
  7. Create the DRPolicy resource on the hub cluster which is used to deploy, failover, and relocate the workloads across managed clusters. See Creating Disaster Recovery Policy on Hub cluster.