Chapter 30. Configuring disaster recovery clusters

One method of providing disaster recovery for a high availability cluster is to configure two clusters. You can then configure one cluster as your primary site cluster, and the second cluster as your disaster recovery cluster.

In normal circumstances, the primary cluster is running resources in production mode. The disaster recovery cluster has all the resources configured as well and is either running them in demoted mode or not at all. For example, there may be a database running in the primary cluster in promoted mode and running in the disaster recovery cluster in demoted mode. The database in this setup would be configured so that data is synchronized from the primary to disaster recovery site. This is done through the database configuration itself rather than through the pcs command interface.

When the primary cluster goes down, users can use the pcs command interface to manually fail the resources over to the disaster recovery site. They can then log in to the disaster site and promote and start the resources there. Once the primary cluster has recovered, users can use the pcs command interface to manually move resources back to the primary site.

As of Red Hat Enterprise Linux 8.2, you can use the pcs command to display the status of both the primary and the disaster recovery site cluster from a single node on either site.

30.1. Considerations for disaster recovery clusters

When planning and configuring a disaster recovery site that you will manage and monitor with the pcs command interface, note the following considerations.

  • The disaster recovery site must be a cluster. This makes it possible to configure it with same tools and similar procedures as the primary site.
  • The primary and disaster recovery clusters are created by independent pcs cluster setup commands.
  • The clusters and their resources must be configured so that that the data is synchronized and failover is possible.
  • The cluster nodes in the recovery site can not have the same names as the nodes in the primary site.
  • The pcs user hacluster must be authenticated for each node in both clusters on the node from which you will be running pcs commands.

30.2. Displaying status of recovery clusters (RHEL 8.2 and later)

To configure a primary and a disaster recovery cluster so that you can display the status of both clusters, perform the following procedure.

Note

Setting up a disaster recovery cluster does not automatically configure resources or replicate data. Those items must be configured manually by the user.

In this example:

  • The primary cluster will be named PrimarySite and will consist of the nodes z1.example.com. and z2.example.com.
  • The disaster recovery site cluster will be named DRsite and will consist of the nodes z3.example.com and z4.example.com.

This example sets up a basic cluster with no resources or fencing configured.

  1. Authenticate all of the nodes that will be used for both clusters.

    [root@z1 ~]# pcs host auth z1.example.com z2.example.com z3.example.com z4.example.com -u hacluster -p password
    z1.example.com: Authorized
    z2.example.com: Authorized
    z3.example.com: Authorized
    z4.example.com: Authorized
  2. Create the cluster that will be used as the primary cluster and start cluster services for the cluster.

    [root@z1 ~]# pcs cluster setup PrimarySite z1.example.com z2.example.com --start
    {...}
    Cluster has been successfully set up.
    Starting cluster on hosts: 'z1.example.com', 'z2.example.com'...
  3. Create the cluster that will be used as the disaster recovery cluster and start cluster services for the cluster.

    [root@z1 ~]# pcs cluster setup DRSite z3.example.com z4.example.com --start
    {...}
    Cluster has been successfully set up.
    Starting cluster on hosts: 'z3.example.com', 'z4.example.com'...
  4. From a node in the primary cluster, set up the second cluster as the recovery site. The recovery site is defined by a name of one of its nodes.

    [root@z1 ~]# pcs dr set-recovery-site z3.example.com
    Sending 'disaster-recovery config' to 'z3.example.com', 'z4.example.com'
    z3.example.com: successful distribution of the file 'disaster-recovery config'
    z4.example.com: successful distribution of the file 'disaster-recovery config'
    Sending 'disaster-recovery config' to 'z1.example.com', 'z2.example.com'
    z1.example.com: successful distribution of the file 'disaster-recovery config'
    z2.example.com: successful distribution of the file 'disaster-recovery config'
  5. Check the disaster recovery configuration.

    [root@z1 ~]# pcs dr config
    Local site:
      Role: Primary
    Remote site:
      Role: Recovery
      Nodes:
        z1.example.com
        z2.example.com
  6. Check the status of the primary cluster and the disaster recovery cluster from a node in the primary cluster.

    [root@z1 ~]# pcs dr status
    --- Local cluster - Primary site ---
    Cluster name: PrimarySite
    
    WARNINGS:
    No stonith devices and stonith-enabled is not false
    
    Cluster Summary:
      * Stack: corosync
      * Current DC: z2.example.com (version 2.0.3-2.el8-2c9cea563e) - partition with quorum
      * Last updated: Mon Dec  9 04:10:31 2019
      * Last change:  Mon Dec  9 04:06:10 2019 by hacluster via crmd on z2.example.com
      * 2 nodes configured
      * 0 resource instances configured
    
    Node List:
      * Online: [ z1.example.com z2.example.com ]
    
    Full List of Resources:
      * No resources
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
    
    
    --- Remote cluster - Recovery site ---
    Cluster name: DRSite
    
    WARNINGS:
    No stonith devices and stonith-enabled is not false
    
    Cluster Summary:
      * Stack: corosync
      * Current DC: z4.example.com (version 2.0.3-2.el8-2c9cea563e) - partition with quorum
      * Last updated: Mon Dec  9 04:10:34 2019
      * Last change:  Mon Dec  9 04:09:55 2019 by hacluster via crmd on z4.example.com
      * 2 nodes configured
      * 0 resource instances configured
    
    Node List:
      * Online: [ z3.example.com z4.example.com ]
    
    Full List of Resources:
      * No resources
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled

For additional display options for a disaster recovery configuration, see the help screen for the pcs dr command.