Chapter 15. Troubleshooting replication-related problems
This section lists frequent error messages in replication environments, explains possible causes, and offers remedy.
15.1. Configuring Directory Server to log replication-related errors
To log replication-related errors, enable replication debugging. The nsslapd-errorlog-level
parameter is additive. This means that, to enable multiple logging features, you have to add the values of each logging feature, and set the sum in nsslapd-errorlog-level
.
Procedure
Display the current error log level:
#
dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-errorlog-level
nsslapd-errorlog-level: 16384The value to enable replication debugging is
8192
. Set thensslapd-errorlog-level
parameter to24576
(8192
+ the previous value16384
) to enable replication debugging in addition to the currently enabled error logging features:#
dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-errorlog-level=24576
15.2. Overview of replication-related errors, causes, and possible solutions
The following is an overview of replication-related errors and possible solutions:
agmt=agreement_name (host_name:port) Replica has a different generation ID than the local data
- Reason: The consumer specified in parenthesis of the message has not been successfully initialized yet, or it was initialized from a different root supplier.
- Impact: The local supplier will not replicate any data to the consumer.
-
Solution: Ignore this message if it occurs before the consumer is initialized. Otherwise, reinitialize the consumer if the message is persistent. In a multi-supplier environment, all servers need be initialized only once from a root supplier, directly or indirectly. For example, server
S1
initializesS2
andS4
,S2
then initializesS3
, and so on. The important thing to note is thatS2
must not start initializingS3
until the initialization ofS2
is done. For this, check the total update status from the web console onS1
or in the error log ofS1
orS2
. Also,S2
should not initializeS1
back.
Warning: data for replica’s was reloaded, and it no longer matches the data in the changelog. Recreating the changelog file. This could affect replication with replica’s consumers, in which case the consumers should be reinitialized.
- Reason: This message can appear only when you restart a supplier. It indicates that the supplier was unable to write the changelog or did not flush out its replica update vector (RUV) at its last shutdown. The former case usually happens because of a disk-space problem, and the latter case because a server crashed or was ungracefully shut down.
-
Impact: The server is not be able to send the changes to a consumer if the consumer’s
maxcsn
value no longer exists in the server’s changelog. - Remedy: Check the disk space and for possible core files under the server’s logs directory. If this is a single-supplier replication, reinitialize the consumers. Otherwise, if the server later complains that it cannot locate change sequence numbers (CSN) for a consumer, verify if the consumer can receive the CSN from other suppliers. If not, reinitialize the consumer.
Too much time skew
- Reason: The system clocks on the host machines are extremely out of sync.
- Impact: Directory Server uses the system clock to generate a part of the CSN. In order to reflect the change sequence among multiple suppliers, suppliers would forward-adjust their local clocks based on the remote clocks of the other suppliers. Because the adjustment is limited to a certain amount, any difference that exceeds the permitted limit will cause the replication session to be aborted.
-
Remedy: Synchronize the system clocks on the Directory Server host machines, for example, by configuring the
chronyd
service.
agmt=agreement_name (host_name:port): Warning: Unable to send endReplication extended operation (error_message)
- Reason: The consumer is not responding.
- Impact: If the consumer recovers without being restarted, there is a chance that the replica on the consumer will be locked forever if it did not receive the release lock message from the supplier.
- Remedy: Watch if the consumer can receive any new change from any of its suppliers, or start the replication monitor, and see if all the suppliers of this consumer warn that the replica is busy. If the replica appears to be locked forever and no supplier can get in, restart the consumer.
Changelog is getting too big.
- Reason: Either changelog purge is turned off, which is the default setting, or changelog purge is turned on, but some consumers are way behind the supplier.
Remedy: By default, changelog purge is turned off. To turn it on from the command line, enter:
#
dsconf -D "cn=Directory Manager" ldap://server.example.com replication set-changelog --max-age 1d --suffix "dc=example,dc=com"
1d
means 1 day. Other valid time units ares
for seconds,m
for minutes,h
for hours, andw
for weeks. A value of0
turns off the purge.With changelog purge turned on, a purge thread that wakes up every five minutes removes a change if its age is greater than the value you set and if it has been replayed to all the direct consumers of this supplier or hub.
If it appears that the changelog is not purged when the purge threshold is reached, check the maximum time lag from the replication monitor among all the consumers. Irrespective of what the purge threshold is, no change will be purged before it is replayed by all the consumers.