Chapter 13. Relocating an application between managed clusters

A relocation operation is very similar to failover. Relocate is application based and uses the DRPlacementControl to trigger the relocation. The main difference for failback is that the application is scaled down on the failoverCluster and therefore creating a NetworkFence is not required.

Procedure

  1. Remove NetworkFence resource and disable Fencing.

    Before a failback or relocate action can be successful the NetworkFence for the Primary managed cluster must be deleted.

    1. Execute this command in the Secondary managed cluster and modify <cluster1> to be correct for the NetworkFence YAML filename created in the prior section.

      $ oc delete -f network-fence-<cluster1>.yaml

      Example output:

      networkfence.csiaddons.openshift.io "network-fence-ocp4perf1" deleted
    2. Reboot OpenShift Container Platform nodes that were Fenced.

      This step is required because some application Pods on the prior fenced cluster, in this case the Primary managed cluster, are in an unhealthy state (For example: CreateContainerError, CrashLoopBackOff). This can be most easily fixed by rebooting all worker OpenShift nodes one at a time.

      Note

      The OpenShift Web Console dashboards and Overview page can also be used to assess the health of applications and the external storage. The detailed OpenShift Data Foundation dashboard is found by navigating to Storage → Data Foundation.

    3. Verify all Pods are in a healthy state by running this command on the Primary managed cluster after all OpenShift nodes have rebooted and are in a Ready status. The output for this query should be zero Pods.

      $ oc get pods -A | egrep -v 'Running|Completed'

      Example output:

      NAMESPACE                                          NAME                                                              READY   STATUS      RESTARTS       AGE
      Important

      If there are Pods still in an unhealthy status because of severed storage communication, troubleshoot and resolve before continuing. Because the storage cluster is external to OpenShift, it also has to be properly recovered after a site outage for OpenShift applications to be healthy.

  2. Modify DRPolicy to Unfenced status.

    In order for the ODR HUB operator to know the NetworkFence has been removed for the Primary managed cluster the DRPolicy must be modified for the newly Unfenced cluster.

    1. Edit the DRPolicy on the Hub cluster and change <cluster1> (example ocp4perf1) from ManuallyFenced to Unfenced.

      $ oc edit drpolicy odr-policy

      Example output:

      [...]
      spec:
        drClusterSet:
        - clusterFence: Unfenced  ## <-- Modify from ManuallyFenced to Unfenced
          name: ocp4perf1
          region: metro
          s3ProfileName: s3-primary
        - clusterFence: Unfenced
          name: ocp4perf2
          region: metro
          s3ProfileName: s3-secondary
      [...]

      Example output:

      drpolicy.ramendr.openshift.io/odr-policy edited
    2. Verify that the status of DRPolicy in the Hub cluster has changed to Unfenced for the Primary managed cluster.

      $ oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClusters

      Example output:

        drClusters:
          ocp4perf1:
            status: Unfenced
            string: ocp4perf1
          ocp4perf2:
            status: Unfenced
            string: ocp4perf2
  3. Modify DRPlacementControl to failback

    1. On the Hub cluster navigate to Installed Operators and then click Openshift DR Hub Operator.
    2. Click DRPlacementControl tab.
    3. Click DRPC busybox-drpc and then the YAML view.
    4. Modify action to Relocate.

      DRPlacementControl modify action to Relocate

      Image show where to modify the action in the YAML view

    5. Click Save.
    6. Verify if the application busybox is now running in the Primary managed cluster.The failback is to the preferredCluster ocp4perf1 as specified in the YAML file, which is where the application was running before the failover operation.

      $ oc get pods,pvc -n busybox-sample

      Example output:

      NAME          READY   STATUS    RESTARTS   AGE
      pod/busybox   1/1     Running   0          60s
      
      NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
      persistentvolumeclaim/busybox-pvc   Bound    pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb   5Gi        RWO            ocs-storagecluster-ceph-rbd   61s
    7. Verify if busybox is running in the Secondary managed cluster. The busybox application should no longer be running on this managed cluster.

      $ oc get pods,pvc -n busybox-sample

      Example output:

      No resources found in busybox-sample namespace.
Important

Be aware of known Metro-DR issues as documented in Known Issues section of Release Notes.