Chapter 2. Recovering from server loss with replication

If a server is severely disrupted or lost, having multiple replicas ensures you can create a replacement replica and quickly restore the former level of redundancy.

If your IdM topology contains an integrated Certificate Authority (CA), the steps for removing and replacing a damaged replica differ for the CA renewal master and other replicas.

2.1. Recovering from losing the CA renewal master

If the Certificate Authority (CA) renewal master is lost, you must first promote another CA replica to fulfill the CA renewal master role, and then deploy a replacement CA replica.

Prerequisites

  • Your deployment uses IdM’s internal Certificate Authority (CA).
  • Another Replica in the environment has CA services installed.
Warning

An IdM deployment is unrecoverable if:

  1. The CA renewal master has been lost.
  2. No other server has a CA installed.
  3. No backup of a replica with the CA role exists.

    It is critical to make backups from a replica with the CA role so certificate data is protected. For more information on creating and restoring from backups, see Preparing for data loss with IdM backups.

Procedure

  1. Remove replication agreements to the lost CA renewal master. See Uninstalling an IdM server.
  2. Promote another CA Replica in the environment to act as the new CA renewal master. See Changing and resetting IdM CA Renewal Master.
  3. Install a new CA Replica to replace the lost CA replica. See Installing an IdM replica with a CA.
  4. Update DNS to reflect changes in the replica topology. If IdM DNS is used, DNS service records are updated automatically.
  5. Verify IdM clients can reach IdM servers. See Adjusting IdM clients during recovery.

Verification steps

  1. Test the Kerberos server on the new replica by successfully retrieving a Kerberos Ticket-Granting-Ticket as an IdM user.

    [root@master ~]# kinit admin
    Password for admin@EXAMPLE.COM:
    
    [root@master ~]# klist
    Ticket cache: KCM:0
    Default principal: admin@EXAMPLE.COM
    
    Valid starting       Expires              Service principal
    10/31/2019 15:51:37  11/01/2019 15:51:02  HTTP/master.example.com@EXAMPLE.COM
    10/31/2019 15:51:08  11/01/2019 15:51:02  krbtgt/EXAMPLE.COM@EXAMPLE.COM
  2. Test the Directory Server and SSSD configuration by retrieving user information.

    [root@master ~]# ipa user-show admin
      User login: admin
      Last name: Administrator
      Home directory: /home/admin
      Login shell: /bin/bash
      Principal alias: admin@EXAMPLE.COM
      UID: 1965200000
      GID: 1965200000
      Account disabled: False
      Password: True
      Member of groups: admins, trust admins
      Kerberos keys available: True
  3. Test the CA configuration with the ipa cert-show command.

    [root@master ~]# ipa cert-show 1
      Issuing CA: ipa
      Certificate: MIIEgjCCAuqgAwIBAgIjoSIP...
      Subject: CN=Certificate Authority,O=EXAMPLE.COM
      Issuer: CN=Certificate Authority,O=EXAMPLE.COM
      Not Before: Thu Oct 31 19:43:29 2019 UTC
      Not After: Mon Oct 31 19:43:29 2039 UTC
      Serial number: 1
      Serial number (hex): 0x1
      Revoked: False

Additional resources

2.2. Recovering from losing a regular replica

To replace a replica that is not the Certificate Authority (CA) renewal master, remove the lost replica from the topology and install a new replica in its place.

Prerequisites

Procedure

  1. Remove replication agreements to the lost server. See Uninstalling an IdM server.
  2. Deploy a new replica with the desired services (CA, KRA, DNS). See Installing an IdM replica.
  3. Update DNS to reflect changes in the replica topology. If IdM DNS is used, DNS service records are updated automatically.
  4. Verify IdM clients can reach IdM servers. See Adjusting IdM clients during recovery.

Verification steps

  1. Test the Kerberos server on the new replica by successfully retrieving a Kerberos Ticket-Granting-Ticket as an IdM user.

    [root@newreplica ~]# kinit admin
    Password for admin@EXAMPLE.COM:
    
    [root@newreplica ~]# klist
    Ticket cache: KCM:0
    Default principal: admin@EXAMPLE.COM
    
    Valid starting       Expires              Service principal
    10/31/2019 15:51:37  11/01/2019 15:51:02  HTTP/master.example.com@EXAMPLE.COM
    10/31/2019 15:51:08  11/01/2019 15:51:02  krbtgt/EXAMPLE.COM@EXAMPLE.COM
  2. Test the Directory Server and SSSD configuration on the new replica by retrieving user information.

    [root@newreplica ~]# ipa user-show admin
      User login: admin
      Last name: Administrator
      Home directory: /home/admin
      Login shell: /bin/bash
      Principal alias: admin@EXAMPLE.COM
      UID: 1965200000
      GID: 1965200000
      Account disabled: False
      Password: True
      Member of groups: admins, trust admins
      Kerberos keys available: True

2.3. Recovering from losing multiple servers

If multiple servers are lost at the same time, determine if the environment can be rebuilt by seeing which one of the following five scenarios applies to your situation.

2.3.1. Recovering from losing multiple servers in a CA-less deployment

Servers in a CA-less deployment are all considered equal, you can rebuild the environment by removing and replacing lost replicas in any order.

2.3.2. Recovering from losing multiple servers when the CA renewal master is unharmed

Prerequisites

  • Your deployment uses IdM’s internal Certificate Authority (CA).

2.3.3. Recovering from losing the CA renewal master and other servers

Prerequisites

  • Your deployment uses IdM’s internal Certificate Authority (CA).
  • At least one CA replica is unharmed.

Procedure

  1. Promote another CA replica to fulfill the CA renewal master role. See Recovering from losing the CA renewal master.
  2. Replace all other lost replicas. See Recovering from losing a regular replica.

2.3.4. Recovering from losing all CA replicas

Without any Certificate Authority (CA) replicas, the IdM environment has lost the ability to deploy additional replicas and rebuild itself.

Prerequisites

  • Your deployment uses IdM’s internal Certificate Authority (CA).

Procedure

  • This situation is a total loss.

Additional resources

2.3.5. Recovering from a total infrastructure loss

If all servers are lost at once, and there are no Virtual Machine (VM) snapshots or data backups to restore from, this situation is unrecoverable.

Procedure

  • This situation is a total loss.

Additional resources