7.3. Defining a Replication Strategy
- Assess the resources within the network, the traffic loads, and resource requirements for the directory service.
- If there are multiple consumers for different locations or sections of the company or if some servers are insecure, then use fractional replication to exclude sensitive or seldom-modified information to maintain data integrity without compromising sensitive information.See Section 7.3.2, “Replicate Selected Attributes with Fractional Replication” for more information.
- If the network is stretched across a wide geographical area, there are multiple Directory Servers at multiple sites, with local data masters connected by multi-master replication.See Section 7.3.5, “Replication Across a Wide-Area Network” for more information.
- If high availability is the primary concern, create a data center with multiple Directory Servers on a single site. Single-master replication provides read-failover, while multi-master replication provides write-failover.See Section 7.3.6, “Using Replication for High Availability” for more information.
- If local availability is the primary concern, use replication to distribute data geographically to Directory Servers in local offices around the world. A master copy of all information can be maintained in a single location, such as the company headquarters, or each local site can manage the parts of the DIT that are relevant for them.See Section 7.3.7, “Using Replication for Local Availability” for more information.
- In all cases, balance the load of requests serviced by the Directory Servers and avoid network congestion.See Section 7.3.8, “Using Replication for Load Balancing” for more information.
7.3.1. Conducting a Replication Survey
- The quality of the LANs and WANs connecting different buildings or remote sites and the amount of available bandwidth.
- The physical location of users, how many users are at each site, and their usage patterns; that is how they intend to use the directory service.
- The number of applications that access the directory service and the relative percentage of read, search, and compare operations to write operations.
- If the messaging server uses the directory, find out how many operations it performs for each email message it handles. Other products that rely on the directory service are typically products such as authentication applications or meta-directory applications. For each one, determine the type and frequency of operations that are performed in the directory service.
- The number and size of the entries stored in the directory service.
7.3.2. Replicate Selected Attributes with Fractional Replication
- Where the consumer server is connected using a slow network, excluding infrequently changed attributes or larger attributes such as
jpegPhotoresults in less network traffic.
- Where the consumer server is placed on an untrusted network such as the public Internet, excluding sensitive attributes such as telephone numbers provides an extra level of protection that guarantees no access to those attributes even if the server's access control measures are defeated or the machine is compromised by an attacker.
7.3.3. Replication Resource Requirements
- Disk usage — On supplier servers, the changelog is written after each update operation. Supplier servers that receive many update operations may experience higher disk usage.
NoteEach supplier server uses a single changelog. If a supplier contains multiple replicated databases, the changelog is used more frequently, and the disk usage is even higher.
- Server threads — Each replication agreement consumes one server thread. So, the number of threads available to client applications is reduced, possibly affecting the server performance for the client applications.
- File descriptors — The number of file descriptors available to the server is reduced by the changelog (one file descriptor) and each replication agreement (one file descriptor per agreement).
7.3.4. Managing Disk Space Required for Multi-Master Replication
cn=changelog5and relate directly to trimming the changelog:
nsslapd-changelogmaxagesets the maximum age that the entries in the changelog can be; once an entry is older than that limit, it is deleted. This keeps the changelog from growing indefinitely.
nsslapd-changelogmaxentriessets the maximum number of entries that are allowed in the changelog. Like
nsslapd-changelogmaxage, this also trims the changelog, but be careful about the setting. This must be large enough to allow a complete set of directory information or multi-master replication may not function properly.
cn=mapping tree, cn=config. These two attributes relate to maintenance information kept in the changelog, the tombstone and state information, rather than the directory edits information.
nsDS5ReplicaPurgeDelaysets the maximum age that tombstone (deleted) entries and state information can be in the changelog. Once a tombstone or state information entry is older than that age, it is deleted. This differs from the
nsslapd-changelogmaxageattribute in that the
nsDS5ReplicaPurgeDelayvalue applies only to tombstone and state information entries;
nsslapd-changelogmaxageapplies to every entry in the changelog, including directory modifications.
nsDS5ReplicaTombstonePurgeIntervalsets the frequency which the server runs a purge operation. At this interval, the Directory Server runs an internal operation to clean the tombstone and state entries out of the changelog. Make sure that the maximum age is longer than the longest replication update schedule or multi-master replication may not be able to update replicas properly.
7.3.5. Replication Across a Wide-Area Network
- Where replication is performed across a public network such as the Internet, the use of SSL is highly recommended. This guards against eavesdropping of the replication traffic.
- Use a T-1 or faster Internet connection for the network.
- When creating agreements for replication over a wide-area network, avoid constant synchronization between the servers. Replication traffic could consume a large portion of the bandwidth and slow down the overall network and Internet connections.
- When initializing consumers, do not to initialize the consumer immediately; instead, utilize file system replica initialization, which is much faster than online initialization or initializing from file. See the Red Hat Directory Server Administrator's Guide for information on using filesystem replica initialization.
7.3.6. Using Replication for High Availability
7.3.7. Using Replication for Local Availability
- To keep a local master copy of the data.This is an important strategy for large, multinational enterprises that need to maintain directory information of interest only to the employees in a specific country. Having a local master copy of the data is also important to any enterprise where interoffice politics dictate that data be controlled at a divisional or organizational level.
- To mitigate unreliable or intermittently available network connections.Intermittent network connections can occur if there are unreliable WANs, as often occurs in international networks.
- To offset periodic, extremely heavy network loads that may cause the performance of the directory service to be severely reduced.Performance may also be affected in enterprises with aging networks, which may experience these conditions during normal business hours.
7.3.8. Using Replication for Load Balancing
- By spreading the users' search activities across several servers.
- By dedicating servers to read-only activities (writes occur only on the supplier server).
- By dedicating special servers to specific tasks, such as supporting mail server activities.
Table 7.1. Effects of Replication and Remote Lookup on the Network
|Load Type||Objects[a]||Accesses/Day[b]||Avg. Entry Size||Load|
[a] For replication, objects refers to the number of entries in the database. For remote lookup, it refers to the number of users who access the database.
[b] For replication, Accesses/Day is based on a 10% change rate to the database that needs to be replicated. For remote lookup, it is based on ten lookups per day for each remote user.
18.104.22.168. Example of Network Load Balancing
Figure 7.9. Managing Enterprise Subtrees in Remote Offices
- Select one server in each office to be the supplier server for the locally managed data.
- Replicate locally managed data from that server to the corresponding supplier server in the remote office.
- Replicate the directory tree on each supplier server (including data supplied from the remote office) to at least one local Directory Server to ensure availability of the directory data. Use multi-master replication for the suffix that is managed locally, and cascading replication for the suffix that receives a master copy of the data from a remote server.
22.214.171.124. Example of Load Balancing for Improved Performance
- Uses a Directory Server that includes 1.5 million entries in support of one million users
- Each user performs ten directory lookups per day
- Uses a messaging server that handles 25 million mail messages per day
- The messaging server performs five directory lookups for every mail message that it handles
Table 7.2. Calculating Directory Server Load
|Access Type||Type Count||Accesses per Day||Total Accesses|
|User Lookup||1 million||10||10 million|
|Email Lookup||25 million||5||125 million|
|Combined accesses||135 million|
|Total||135 million (3,125/second)|
- Place two Directory Servers in a multi-master configuration in one city to handle all write traffic.This configuration assumes that there should be a single point of control for all directory data.
- Use these supplier servers to replicate to one or more hub suppliers.The read, search, and compare requests serviced by the directory service should be targeted at the consumer servers, thereby freeing the supplier servers to handle write requests.
- Use the hub supplier to replicate to local sites throughout the enterprise.Replicating to local sites helps balance the workload of the servers and the WANs, as well as ensuring high availability of directory data.
- At each site, replicate at least once to ensure high availability, at least for read operations.
- Use DNS sort to ensure that local users always find a local Directory Server they can use for directory lookups.
126.96.36.199. Example Replication Strategy for a Small Site
- The entire enterprise is contained within a single building.
- The building has a very fast (100 MB per second) and lightly used network.
- The network is very stable, and the server hardware and OS platforms are reliable.
- A single server is capable of easily handling the site's load.
188.8.131.52. Example Replication Strategy for a Large Site
- The enterprise is contained within two separate buildings.
- There are slow connections between the buildings, and these connections are very busy during normal business hours.
- Choose a single server in one of the two buildings to contain a master copy of the directory data.This server should be placed in the building that contains the largest number of people responsible for the master copy of the directory data. We shall see this building as Building A.
- Replicate at least once within Building A for high availability of directory data.Use a multi-master replication configuration to ensure write-failover.
- Create two replicas in the second building (Building B).
- If there is no need for close consistency between the supplier and consumer server, schedule replication so that it occurs only during off-peak hours.