Chapter 2. Recovering from server loss with replication
If a server is severely disrupted or lost, having multiple replicas ensures you can create a replacement replica and quickly restore the former level of redundancy.
If your IdM topology contains an integrated Certificate Authority (CA), the steps for removing and replacing a damaged replica differ for the CA renewal server and other replicas.
2.1. Recovering from losing the CA renewal server
If the Certificate Authority (CA) renewal server is lost, you must first promote another CA replica to fulfill the CA renewal server role, and then deploy a replacement CA replica.
Prerequisites
- Your deployment uses IdM’s internal Certificate Authority (CA).
- Another Replica in the environment has CA services installed.
An IdM deployment is unrecoverable if:
- The CA renewal server has been lost.
- No other server has a CA installed.
No backup of a replica with the CA role exists.
It is critical to make backups from a replica with the CA role so certificate data is protected. For more information on creating and restoring from backups, see Preparing for data loss with IdM backups.
Procedure
- Remove replication agreements to the lost CA renewal server. See Uninstalling an IdM server.
- Promote another CA Replica in the environment to act as the new CA renewal server. See Changing and resetting IdM CA Renewal Master.
- Install a new CA Replica to replace the lost CA replica. See Installing an IdM replica with a CA.
- Update DNS to reflect changes in the replica topology. If IdM DNS is used, DNS service records are updated automatically.
- Verify IdM clients can reach IdM servers. See Adjusting IdM clients during recovery.
Verification steps
Test the Kerberos server on the new replica by successfully retrieving a Kerberos Ticket-Granting-Ticket as an IdM user.
[root@server ~]# kinit admin Password for admin@EXAMPLE.COM: [root@server ~]# klist Ticket cache: KCM:0 Default principal: admin@EXAMPLE.COM Valid starting Expires Service principal 10/31/2019 15:51:37 11/01/2019 15:51:02 HTTP/server.example.com@EXAMPLE.COM 10/31/2019 15:51:08 11/01/2019 15:51:02 krbtgt/EXAMPLE.COM@EXAMPLE.COM
Test the Directory Server and SSSD configuration by retrieving user information.
[root@server ~]# ipa user-show admin User login: admin Last name: Administrator Home directory: /home/admin Login shell: /bin/bash Principal alias: admin@EXAMPLE.COM UID: 1965200000 GID: 1965200000 Account disabled: False Password: True Member of groups: admins, trust admins Kerberos keys available: True
Test the CA configuration with the
ipa cert-show
command.[root@server ~]# ipa cert-show 1 Issuing CA: ipa Certificate: MIIEgjCCAuqgAwIBAgIjoSIP... Subject: CN=Certificate Authority,O=EXAMPLE.COM Issuer: CN=Certificate Authority,O=EXAMPLE.COM Not Before: Thu Oct 31 19:43:29 2019 UTC Not After: Mon Oct 31 19:43:29 2039 UTC Serial number: 1 Serial number (hex): 0x1 Revoked: False
Additional resources
- For more information regarding the IdM CA renewal server, see Using IdM CA renewal server
2.2. Recovering from losing a regular replica
To replace a replica that is not the Certificate Authority (CA) renewal server, remove the lost replica from the topology and install a new replica in its place.
Prerequisites
- The CA renewal server is operating properly. If the CA renewal server has been lost, see Recovering from losing the CA renewal server.
Procedure
- Remove replication agreements to the lost server. See Uninstalling an IdM server.
- Deploy a new replica with the desired services (CA, KRA, DNS). See Installing an IdM replica.
- Update DNS to reflect changes in the replica topology. If IdM DNS is used, DNS service records are updated automatically.
- Verify IdM clients can reach IdM servers. See Adjusting IdM clients during recovery.
Verification steps
Test the Kerberos server on the new replica by successfully retrieving a Kerberos Ticket-Granting-Ticket as an IdM user.
[root@newreplica ~]# kinit admin Password for admin@EXAMPLE.COM: [root@newreplica ~]# klist Ticket cache: KCM:0 Default principal: admin@EXAMPLE.COM Valid starting Expires Service principal 10/31/2019 15:51:37 11/01/2019 15:51:02 HTTP/server.example.com@EXAMPLE.COM 10/31/2019 15:51:08 11/01/2019 15:51:02 krbtgt/EXAMPLE.COM@EXAMPLE.COM
Test the Directory Server and SSSD configuration on the new replica by retrieving user information.
[root@newreplica ~]# ipa user-show admin User login: admin Last name: Administrator Home directory: /home/admin Login shell: /bin/bash Principal alias: admin@EXAMPLE.COM UID: 1965200000 GID: 1965200000 Account disabled: False Password: True Member of groups: admins, trust admins Kerberos keys available: True
2.3. Recovering from losing multiple servers
If multiple servers are lost at the same time, determine if the environment can be rebuilt by seeing which one of the following five scenarios applies to your situation.
2.3.1. Recovering from losing multiple servers in a CA-less deployment
Servers in a CA-less deployment are all considered equal, you can rebuild the environment by removing and replacing lost replicas in any order.
Procedure
2.3.2. Recovering from losing multiple servers when the CA renewal server is unharmed
Prerequisites
- Your deployment uses IdM’s internal Certificate Authority (CA).
Procedure
2.3.3. Recovering from losing the CA renewal server and other servers
Prerequisites
- Your deployment uses IdM’s internal Certificate Authority (CA).
- At least one CA replica is unharmed.
Procedure
- Promote another CA replica to fulfill the CA renewal server role. See Recovering from losing the CA renewal server.
- Replace all other lost replicas. See Recovering from losing a regular replica.
2.3.4. Recovering from losing all CA replicas
Without any Certificate Authority (CA) replicas, the IdM environment has lost the ability to deploy additional replicas and rebuild itself.
Prerequisites
- Your deployment uses IdM’s internal Certificate Authority (CA).
Procedure
- This situation is a total loss.
Additional resources
- To prepare for total infrastructure loss, see Preparing for data loss with VM snapshots.
2.3.5. Recovering from a total infrastructure loss
If all servers are lost at once, and there are no Virtual Machine (VM) snapshots or data backups to restore from, this situation is unrecoverable.
Procedure
- This situation is a total loss.
Additional resources
- To prepare for total infrastructure loss, see Preparing for data loss with VM snapshots.