7.3. Defining a Replication Strategy

The replication strategy is determined by the services that must be provided. To determine the replication strategy, start by performing a survey of the network, users, applications, and how they use the directory service.
After planning the replication strategy, it is possible to deploy the directory service. It is best to deploy the directory service in stages, because this allows administrators to adjust the directory service according to the loads that the enterprise places on the directory service. Unless the load analysis is based on an already operating directory, be prepared to alter the directory services as the real-life demands on the directory become clear.

7.3.1. Conducting a Replication Survey

Gather information about the network quality and usage in the site survey to help define the replication strategy:
  • The quality of the LANs and WANs connecting different buildings or remote sites and the amount of available bandwidth.
  • The physical location of users, how many users are at each site, and their usage patterns; that is how they intend to use the directory service.
  • The number of applications that access the directory service and the relative percentage of read, search, and compare operations to write operations.
  • If the messaging server uses the directory, find out how many operations it performs for each email message it handles. Other products that rely on the directory service are typically products such as authentication applications or meta-directory applications. For each one, determine the type and frequency of operations that are performed in the directory service.
  • The number and size of the entries stored in the directory service.
A site that manages human resource databases or financial information is likely to put a heavier load on the directory service than a site containing engineering staff that uses the directory solely for telephone book purposes.

7.3.2. Replicate Selected Attributes with Fractional Replication

Fractional replication allows the administrator to choose a set of attributes that are not transmitted from a supplier to the consumer (or another supplier). Administrators can therefore replicate a database without replicating all the information that it contains.
Fractional replication is enabled and configured per replication agreement. The exclusion of attributes is applied equally to all entries. As far as the consumer server is concerned, the excluded attributes always have no value. Therefore, a client performing a search against the consumer server never sees the excluded attributes. Similarly, should it perform a search that specifies those attributes in its filter, no entries match.
Fractional replication is particularly useful in the following situations:
  • Where the consumer server is connected using a slow network, excluding infrequently changed attributes or larger attributes such as jpegPhoto results in less network traffic.
  • Where the consumer server is placed on an untrusted network such as the public Internet, excluding sensitive attributes such as telephone numbers provides an extra level of protection that guarantees no access to those attributes even if the server's access control measures are defeated or the machine is compromised by an attacker.
Configuring fractional replication is described in the replication agreement and supplier configuration sections in chapter 8, "Managing Replication," in the Administration Guide.

7.3.3. Replication Resource Requirements

Using replication requires more resources. Consider the following resource requirements when defining the replication strategy:
  • Disk usage — On supplier servers, the changelog is written after each update operation. Supplier servers that receive many update operations may experience higher disk usage.

    Note

    Each supplier server uses a single changelog. If a supplier contains multiple replicated databases, the changelog is used more frequently, and the disk usage is even higher.
  • Server threads — Each replication agreement consumes one server thread. So, the number of threads available to client applications is reduced, possibly affecting the server performance for the client applications.
  • File descriptors — The number of file descriptors available to the server is reduced by the changelog (one file descriptor) and each replication agreement (one file descriptor per agreement).

7.3.4. Managing Disk Space Required for Multi-Supplier Replication

Multi-supplier replicas maintain additional logs, including the changelog of directory edits, state information for update entries, and tombstone entries for deleted entries. This information is required for multi-supplier replication to be performed. Because these log files can get very large, periodically cleaning up these files is necessary to keep from wasting disk space.
There are four attributes which can configure the changelog maintenance for the multi-supplier replica. Two are under cn=changelog5 and relate directly to trimming the changelog:
  • nsslapd-changelogmaxage sets the maximum age that the entries in the changelog can be; once an entry is older than that limit, it is deleted. This keeps the changelog from growing indefinitely.
  • nsslapd-changelogmaxentries sets the maximum number of entries that are allowed in the changelog. Like nsslapd-changelogmaxage, this also trims the changelog, but be careful about the setting. This must be large enough to allow a complete set of directory information or multi-supplier replication may not function properly.
The other two attributes are under the replication agreement entry in cn=replica, cn=suffixDN, cn=mapping tree, cn=config. These two attributes relate to maintenance information kept in the changelog, the tombstone and state information, rather than the directory edits information.
  • nsDS5ReplicaPurgeDelay sets the maximum age that tombstone (deleted) entries and state information can be in the changelog. Once a tombstone or state information entry is older than that age, it is deleted. This differs from the nsslapd-changelogmaxage attribute in that the nsDS5ReplicaPurgeDelay value applies only to tombstone and state information entries; nsslapd-changelogmaxage applies to every entry in the changelog, including directory modifications.
  • nsDS5ReplicaTombstonePurgeInterval sets the frequency which the server runs a purge operation. At this interval, the Directory Server runs an internal operation to clean the tombstone and state entries out of the changelog. Make sure that the maximum age is longer than the longest replication update schedule or multi-supplier replication may not be able to update replicas properly.
The parameters for managing replication and the changelog are described in chapter 2, "Core Configuration Attributes," in the Configuration, Command, and File Reference.

7.3.5. Replication Across a Wide-Area Network

Wide-area networks typically have higher latency, a higher bandwidth-delay product, and lower speeds than local area networks . Directory Server supports efficient replication when a supplier and consumer are connected using a wide-area network.
In previous versions of Directory Server, the replication protocols that were used to transmit entries and updates between suppliers and consumers were highly latency-sensitive, because the supplier would send only one update operation and then wait for a response from the consumer. This led to reduced throughput with higher latencies.
The supplier sends many updates and entries to the consumer without waiting for a response. Thus, on a network with high latency, many replication operations can be in transit on the network, and replication throughput is similar to that which can be achieved on a local area network.

Note

If a supplier is connected to another supplier running a version of Red Hat Directory Server earlier than 7.1, it falls back to the old replication mechanism for compatibility. It is therefore necessary to run at least version 7.1 on both the supplier and consumer servers in order to achieve latency-insensitive replication.
There are both performance and security issues to consider for both the Directory Server and the efficiency of the network connection:
  • Where replication is performed across a public network such as the Internet, the use of TLS is highly recommended. This guards against eavesdropping of the replication traffic.
  • Use a T-1 or faster Internet connection for the network.
  • When creating agreements for replication over a wide-area network, avoid constant synchronization between the servers. Replication traffic could consume a large portion of the bandwidth and slow down the overall network and Internet connections.
  • When initializing consumers, do not initialize the consumer immediately; instead, utilize file system replica initialization, which is much faster than online initialization or initializing from file. See the Red Hat Directory Server Administration Guide for information on using file system replica initialization.

7.3.6. Using Replication for High Availability

Use replication to prevent the loss of a single server from causing the directory service to become unavailable. At a minimum, replicate the local directory tree to at least one backup server.
Some directory architects argue that information should be replicated three times per physical location for maximum data reliability. The extent to use replication for fault tolerance depends on the environment and personal preferences, but base this decision on the quality of the hardware and networks used by the directory service. Unreliable hardware requires more backup servers.

Note

Do not use replication as a replacement for a regular data backup policy. For information on backing up the directory data, see the Red Hat Directory Server Administration Guide.
To guarantee write-failover for all directory clients, use a multi-supplier replication scenario. If read-failover is sufficient, use single-supplier replication.
LDAP client applications can usually be configured to search only one LDAP server. Unless there is a custom client application to rotate through LDAP servers located at different DNS host names, the LDAP client applications can only be configured to look up a single DNS host name for a Directory Server. Therefore, it is probably necessary to use either DNS round-robins or network sorts to provide failover to the backup Directory Servers. For information on setting up and using DNS round-robins or network sorts, see the DNS documentation.

7.3.7. Using Replication for Local Availability

The necessity of replicating for local availability is determined by the quality of the network as well as the activities of the site. In addition, carefully consider the nature of the data contained in the directory service and the consequences to the enterprise if that data were to become temporarily unavailable. The more mission-critical the data, the less tolerant the system is of outages caused by poor network connections.
Use replication for local availability for the following reasons:
  • To keep a local main copy of the data.
    This is an important strategy for large, multinational enterprises that need to maintain directory information of interest only to the employees in a specific country. Having a local main copy of the data is also important to any enterprise where interoffice politics dictate that data be controlled at a divisional or organizational level.
  • To mitigate unreliable or intermittently available network connections.
    Intermittent network connections can occur if there are unreliable WANs, as often occurs in international networks.
  • To offset periodic, extremely heavy network loads that may cause the performance of the directory service to be severely reduced.
    Performance may also be affected in enterprises with aging networks, which may experience these conditions during normal business hours.

7.3.8. Using Replication for Load Balancing

Replication can balance the load on the Directory Servers in several ways:
  • By spreading the users' search activities across several servers.
  • By dedicating servers to read-only activities (writes occur only on the supplier server).
  • By dedicating special servers to specific tasks, such as supporting mail server activities.
Balancing the workload of the network is an important function performed by directory data replication. Whenever possible, move data to servers that can be accessed using a reasonably fast and reliable network connection. The most important considerations are the speed and reliability of the network connection between the server and the directory users.
Directory entries generally average around one kilobyte (KB) in size. Therefore, every directory lookup adds about one KB to the network load. If the directory users perform ten directory lookups per day, then, for every directory user, there is an increased network load of around 10 KB per day. If the site has a slow, heavily loaded, or unreliable WAN, then consider replicating the directory tree to a local server.
Also consider whether the benefit of locally available data is worth the cost of the increased network load caused by replication. If an entire directory tree is replicated to a remote site, for instance, that potentially adds a large strain on the network in comparison to the traffic caused by the users' directory lookups. This is especially true if the directory tree is changing frequently, yet there are only a few users at the remote site performing a few directory lookups per day.
Table 7.1, “Effects of Replication and Remote Lookup on the Network” compares the approximate cost of replicating a directory of one million entries, where 10% of those entries undergo daily change, with the cost of having a small remote site of 100 employees perform 10 lookups per day. In each case the average size of a directory entry is assumed to be 1KB.

Table 7.1. Effects of Replication and Remote Lookup on the Network

Load Type Objects[a] Accesses/Day[b] Avg. Entry Size Load
Replication 1 million 100,000 1KB 100Mb/day
Remote Lookup 100 1,000 1KB 1Mb/day
[a] For replication, objects refers to the number of entries in the database. For remote lookup, it refers to the number of users who access the database.
[b] For replication, Accesses/Day is based on a 10% change rate to the database that needs to be replicated. For remote lookup, it is based on ten lookups per day for each remote user.
Given the difference in loads caused by replication versus that caused by normal directory usage, using replication for network load-balancing purposes may not be desirable. On the other hand, the benefits of locally available directory data can far outweigh any considerations regarding network loads.
A good compromise between making data available to local sites and overloading the network is to use scheduled replication. For more information on data consistency and replication schedules, see Section 7.1.2, “Data Consistency”.

7.3.8.1. Example of Network Load Balancing

In this example, the enterprise has offices in New York and Los Angeles, and each office has specific subtrees that they manage.
Managing Enterprise Subtrees in Remote Offices

Figure 7.9. Managing Enterprise Subtrees in Remote Offices

Each office contains a high-speed network, but the connection between two cities is unreliable. To balance the network load:
  1. Select one server in each office to be the supplier server for the locally managed data.
  2. Replicate locally managed data from that server to the corresponding supplier server in the remote office.
  3. Replicate the directory tree on each supplier server (including data supplied from the remote office) to at least one local Directory Server to ensure availability of the directory data. Use multi-supplier replication for the suffix that is managed locally, and cascading replication for the suffix that receives a main copy of the data from a remote server.

7.3.8.2. Example of Load Balancing for Improved Performance

Suppose that the enterprise has the following characteristics:
  • Uses a Directory Server that includes 1.5 million entries in support of one million users
  • Each user performs ten directory lookups per day
  • Uses a messaging server that handles 25 million mail messages per day
  • The messaging server performs five directory lookups for every mail message that it handles
This equates to ten million directory lookups per day for users, and 125 million directory lookups per day for email; a total of 135 million directory lookups per day.
With an eight-hour business day and users spread across four time zones, for example, the business day (or peak usage) across four time zones extends to 12 hours. Therefore, the service must support 135 million directory lookups in a 12-hour day. This equates to 3,125 lookups per second (135,000,000 / (60*60*12)).

Table 7.2. Calculating Directory Server Load

Access Type Type Count Accesses per Day Total Accesses
User Lookup 1 million 10 10 million
Email Lookup 25 million 5 125 million
Combined accesses 135 million
Total 135 million (3,125/second)
If the hardware that runs the Directory Servers supports 500 reads per second, at least six or seven Directory Servers must be used to support this load. For enterprises with a million directory users, add more Directory Servers for local availability purposes.
There are several different methods of replication:
  • Place two Directory Servers in a multi-supplier configuration in one city to handle all write traffic.
    This configuration assumes that there should be a single point of control for all directory data.
  • Use these supplier servers to replicate to one or more hub suppliers.
    The read, search, and compare requests serviced by the directory service should be targeted at the consumer servers, thereby freeing the supplier servers to handle write requests.
  • Use the hub supplier to replicate to local sites throughout the enterprise.
    Replicating to local sites helps balance the workload of the servers and the WANs, as well as ensuring high availability of directory data.
  • At each site, replicate at least once to ensure high availability, at least for read operations.
  • Use DNS sort to ensure that local users always find a local Directory Server they can use for directory lookups.

7.3.8.3. Example Replication Strategy for a Small Site

Example Corp. has the following characteristics:
  • The entire enterprise is contained within a single building.
  • The building has a very fast (100 Mb per second) and lightly used network.
  • The network is very stable, and the server hardware and OS platforms are reliable.
  • A single server is capable of easily handling the site's load.
In this case, Example Corp. decides to replicate at least once to ensure availability in the event the primary server is shut down for maintenance or hardware upgrades. Also, set up a DNS round-robin to improve LDAP connection performance in the event that one of the Directory Servers becomes unavailable.

7.3.8.4. Example Replication Strategy for a Large Site

As Example Corp. has grown, it retains its previous characteristics (as in Section 7.3.8.3, “Example Replication Strategy for a Small Site”) with a few changes:
  • The enterprise is contained within two separate buildings.
  • There are slow connections between the buildings, and these connections are very busy during normal business hours.
As their network needs changes, then Example Corp.'s administrators adjust their replication strategy:
  • Choose a single server in one of the two buildings to contain a main copy of the directory data.
    This server should be placed in the building that contains the largest number of people responsible for the main copy of the directory data. We shall see this building as Building A.
  • Replicate at least once within Building A for high availability of directory data.
    Use a multi-supplier replication configuration to ensure write-failover.
  • Create two replicas in the second building (Building B).
  • If there is no need for close consistency between the supplier and consumer server, schedule replication so that it occurs only during off-peak hours.