Chapter 12. Application failover between managed clusters

This section provides instructions on how to failover the busybox sample application. The failover method for Metro-DR is application based. Each application that is to be protected in this manner must have a corresponding DRPlacementControl resource and a PlacementRule resource created in the application namespace as shown in the Create Sample Application for DR testing section.

Procedure

  1. Create NetworkFence resource and enable Fencing.

    Specify the list of CIDR blocks or IP addresses on which network fencing operation will be performed. In our case, this will be the EXTERNAL-IP of every OpenShift node in the cluster that needs to be fenced from using the external RHCS cluster.

    1. Execute this command to get the IP addresses for the Primary managed cluster.

      $ oc get nodes -o jsonpath='{range .items[*]}{.status.addresses[?(@.type=="ExternalIP")].address}{"\n"}{end}'

      Example output:

      10.70.56.118
      10.70.56.193
      10.70.56.154
      10.70.56.242
      10.70.56.136
      10.70.56.99
      Note

      Collect the current IP addresses of all OpenShift nodes before there is a site outage. Best practice would be to create the NetworkFence YAML file and have it available and up-to-date for a disaster recovery event.

      The IP addresses for all nodes will be added to the NetworkFence example resource as shown below. This example is for six nodes but there could be more nodes in your cluster.

      apiVersion: csiaddons.openshift.io/v1alpha1
      kind: NetworkFence
      metadata:
        name: network-fence-<cluster1>
      spec:
        driver: openshift-storage.rbd.csi.ceph.com
        cidrs:
          -  <IP_Address1>/32
          -  <IP_Address2>/32
          -  <IP_Address3>/32
          -  <IP_Address4>/32
          -  <IP_Address5>/32
          -  <IP_Address6>/32
          [...]
        secret:
          name: rook-csi-rbd-provisioner
          namespace: openshift-storage
        parameters:
          clusterID: openshift-storage
    2. For the YAML file example above, modify the IP addresses and provide the correct <cluster1> to be the cluster name found in RHACM for the Primary managed cluster. Save this to filename network-fence-<cluster1>.yaml.

      Important

      The NetworkFence must be created from the opposite managed cluster where the application is currently running prior to failover. In this case, that is the Secondary managed cluster.

      $ oc create -f network-fence-<cluster1>.yaml

      Example output:

      networkfences.csiaddons.openshift.io/network-fence-ocp4perf1 created
      Important

      After the NetworkFence is created, all communication from applications to the OpenShift Data Foundation storage will fail and some Pods will be in an unhealthy state (For example: CreateContainerError, CrashLoopBackOff) on the cluster that is now fenced.

    3. In the same cluster as where the NetworkFence was created, verify that the status is Succeeded. Modify <cluster1> to be correct.

      export NETWORKFENCE=network-fence-<cluster1>
      oc get networkfences.csiaddons.openshift.io/$NETWORKFENCE -n openshift-dr-system -o jsonpath='{.status.result}{"\n"}'

      Example output:

      Succeeded
  2. Modify DRPolicy for the fenced cluster.

    1. Edit the DRPolicy on the Hub cluster and change <cluster1> (for example: ocp4perf1) from Unfenced to ManuallyFenced.

      $ oc edit drpolicy odr-policy

      Example output:

      [...]
      spec:
        drClusterSet:
        - clusterFence: ManuallyFenced  ## <-- Modify from Unfenced to ManuallyFenced
          name: ocp4perf1
          region: metro
          s3ProfileName: s3-primary
        - clusterFence: Unfenced
          name: ocp4perf2
          region: metro
          s3ProfileName: s3-secondary
      [...]

      Example output:

      drpolicy.ramendr.openshift.io/odr-policy edited
    2. Validate the DRPolicy status in the Hub cluster has changed to Fenced for the Primary managed cluster.

      $ oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClusters

      Example output:

        drClusters:
          ocp4perf1:
            status: Fenced
            string: ocp4perf1
          ocp4perf2:
            status: Unfenced
            string: ocp4perf2
  3. Modify DRPlacementControl to failover

    1. On the Hub cluster navigate to Installed Operators and then click Openshift DR Hub Operator.
    2. Click DRPlacementControl tab.
    3. Click DRPC busybox-drpc and then the YAML view.
    4. Add the action and failoverCluster details as shown in below screenshot. The failoverCluster should be the ACM cluster name for the Secondary managed cluster.

      DRPlacementControl add action Failover

      Image show where to add the action Failover in the YAML view

    5. Click Save.
  4. Verify that the application busybox is now running in the Secondary managed cluster, the failover cluster ocp4perf2 specified in the YAML file.

    $ oc get pods,pvc -n busybox-sample

    Example output:

    NAME          READY   STATUS    RESTARTS   AGE
    pod/busybox   1/1     Running   0          35s
    
    NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
    persistentvolumeclaim/busybox-pvc   Bound    pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb   5Gi        RWO            ocs-storagecluster-ceph-rbd   35s
  5. Verify that busybox is no longer running on the Primary managed cluster.

    $ oc get pods,pvc -n busybox-sample

    Example output:

    No resources found in busybox-sample namespace.
Important

Be aware of known Metro-DR issues as documented in Known Issues section of Release Notes.