Chapter 22. Making a High-availability and Disaster Recovery Plan

Part of running a Directory Server deployment efficiently is planning for that worst case scenario. This chapter covers general principles for drafting a disaster recovery plan and highlights features in Directory Server that can be used to aide in disaster recovery.
Disaster recovery is a way of planning and implementing a smooth transition from one operating environment to another environment whenever there is some sort of catastrophic failure. A disaster recovery plan for Directory Server may be part of a larger business continuity plan or it could be a standalone plan specifically for an interruption in directory services.

Note

This chapter covers very general concepts for disaster recovery.
Disaster recovery can be a very complex and detail-specific thing. Consider using a professional service to design, maintain, and test any disaster recovery plan for sensitive or mission-critical services, like Red Hat Directory Server.

22.1. Identifying Potential Scenarios

The first step is identifying what potential issues you may encounter, what services will be affected, and what responses you should take. In the Red Hat Directory Server Deployment Guide, administrators made a site survey of their existing and proposed infrastructure to determine what kind of directory to design. Do something similar for disaster planning; as in Table 22.1, “Disaster Scenarios and Responses”, identify where your data infrastructure is, determine what the affect of losing that component is, and look at potential ideal responses.

Table 22.1. Disaster Scenarios and Responses

Scenario Effects on Infrastructure Ideal Response
Data corruption Through software or hardware failure (or through a malicious attack), the data at one site or on one server could be corrupted. If that corrupted server is a supplier in multi-supplier replication, then the corruption can quickly be propagated throughout the deployment. An isolated server should be available with access to the most recent backup of uncorrupted data. When a problem is detected, replication can be suspended on the regular infrastructure, and this server can be brought online to reinitialize the suppliers with good data.
Natural disasters and other mass events Natural disasters can take an entire office or data center offline, even through something as simple as a long-term power outage. Directory operations can be transferred to a mirrored site at another physical location, with the same data.
Server or machine loss A single machine could fail. Another machine, with the same data, can assume the lost machine's place.