Chapter 7. Designing the Replication Process

Replicating the directory contents increases the availability and performance of the directory service. Chapter 4, Designing the Directory Tree and Chapter 6, Designing the Directory Topology cover the design of the directory tree and the directory topology. This chapter addresses the physical and geographical location of the data and, specifically, how to use replication to ensure the data is available when and where it is needed.
This chapter discusses uses for replication and offers advice on designing a replication strategy for the directory environment.

7.1. Introduction to Replication

Replication is the mechanism that automatically copies directory data from one Red Hat Directory Server to another. Using replication, any directory tree or subtree (stored in its own database) can be copied between servers. The Directory Server that holds the main copy of the information automatically copies any updates to all replicas.
Replication provides a high-availability directory service and can distribute the data geographically. In practical terms, replication provides the following benefits:
  • Fault tolerance and failover — By replicating directory trees to multiple servers, the directory service is available even if hardware, software, or network problems prevent the directory client applications from accessing a particular Directory Server. Clients are referred to another Directory Server for read and write operations.

    Note

    Write failover is only possible with multi-supplier replication.
  • Load balancing — Replicating the directory tree across servers reduces the access load on any given machine, thereby improving server response time.
  • Higher performance and reduced response times — Replicating directory entries to a location close to users significantly improves directory response times.
  • Local data management — Replication allows information to be owned and managed locally while sharing it with other Directory Servers across the enterprise.

7.1.1. Replication Concepts

Always start planning replication by making the following fundamental decisions:
  • What information to replicate.
  • Which servers hold the main copy, or read-write replica, of that information.
  • Which servers hold the read-only copy, or read-only replica, of that information.
  • What should happen when a read-only replica receives an update request; that is, to which server it should refer the request.
These decisions cannot be made effectively without an understanding of how the Directory Server handles these concepts. For example, decide what information to replicate, be aware of the smallest replication unit that the Directory Server can handle. The replication concepts used by the Directory Server provide a framework for thinking about the global decisions that need to be made.

7.1.1.1. Unit of Replication

The smallest unit of replication is a database. An entire database can be replicated but not a subtree within a database. Therefore, when defining the directory tree, always consider replication. For more information on how to set up the directory tree, see Chapter 4, Designing the Directory Tree.
The replication mechanism also requires that one database correspond to one suffix. A suffix (or namespace) that is distributed over two or more databases cannot be replicated.

7.1.1.2. Read-Write and Read-Only Replicas

A database that participates in replication is defined as a replica. Directory Server supports two types of replicas: read-write and read-only. The read-write replicas contain main copies of directory information and can be updated. Read-only replicas refer all update operations to read-write replicas.

7.1.1.3. Suppliers and Consumers

A server that stores a replica that is copied to a different server is called a supplier. A server that stores a replica that is copied from a different server is called a consumer. Generally speaking, the replica on the supplier server is a read-write replica; the replica on the consumer server is a read-only replica. However, the following exceptions apply:

Note

In the current version of Red Hat Directory Server, replication is always initiated by the supplier server, never by the consumer. This is unlike earlier versions of Directory Server, which allowed consumer-initiated replication (where consumer servers could retrieve data from a supplier server).
Suppliers

For any particular replica, the supplier server must:

  • Respond to read requests and update requests from directory clients.
  • Maintain state information and a changelog for the replica.
  • Initiate replication to consumer servers.
The supplier server is always responsible for recording the changes made to the read-write replicas that it manages, so the supplier server makes sure that any changes are replicated to consumer servers.
Consumers

A consumer server must:

  • Respond to read requests.
  • Refer update requests to a supplier server for the replica.
Whenever a consumer server receives a request to add, delete, or change an entry, the request is referred to a supplier for the replica. The supplier server performs the request, then replicates the change.
Hub Suppliers

In the special case of cascading replication, the hub supplier must:

  • Respond to read requests.
  • Refer update requests to a supplier server for the replica.
  • Initiate replication to consumer servers.
For more information on cascading replication, see Section 7.2.3, “Cascading Replication”.

7.1.1.4. Replication and Changelogs

Every supplier server maintains a changelog. A changelog is a record of the modifications that have occurred on a replica. The supplier server then replays these modifications on the replicas stored on consumer servers, or on other suppliers in the case of multi-supplier replication.
When an entry is modified, a change record describing the LDAP operation that was performed is recorded in the changelog.
The changelog size is maintained with two attributes, nsslapd-changelogmaxage or nsslapd-changelogmaxentries. These attributes trim the old changelogs to keep the changelog size reasonable.

7.1.1.5. Replication Agreement

Directory Servers use replication agreements to define replication. A replication agreement describes replication between a single supplier and a single consumer. The agreement is configured on the supplier server. It identifies:
  • The database to replicate.
  • The consumer server to which the data is pushed.
  • The times that replication can occur.
  • The DN that the supplier server must use to bind (called the supplier bind DN).
  • How the connection is secured (TLS, Start TLS, client authentication, SASL, or simple authentication).

7.1.2. Data Consistency

Consistency refers to how closely the contents of replicated databases match each other at a given point in time. Part of the configuration for replication between servers is to schedule updates. The supplier server always determines when consumer servers need to be updated and initiates replication.
Directory Server offers the option of keeping replicas always synchronized or of scheduling updates for a particular time of day or day in the week.
The advantage of keeping replicas constantly synchronized is that it provides better data consistency. The cost is the network traffic resulting from the frequent update operations. This solution is the best option when:
  • There is a reliable, high-speed connection between servers.
  • The client requests serviced by the directory service are mainly search, read, and compare operations, with relatively few update operations.
If it is all right to have a lower level of data consistency, choose the frequency of updates that best suits the use patterns of the network or lowers the affect on network traffic. There are several situations where having scheduled updates instead of constant updates is the best solution:
  • There are unreliable or intermittently available network connections.
  • The client requests serviced by the directory service are mainly update operations.
  • Communication costs have to be lowered.
In the case of multi-supplier replication, the replicas on each supplier are said to be loosely consistent, because at any given time, there can be differences in the data stored on each supplier. This is true, even if the replicas are constantly synchronized, for two reasons:
  • There is a latency in the propagation of update operations between suppliers.
  • The supplier that serviced the update operation does not wait for the second supplier to validate it before returning an "operation successful" message to the client.