Configuring the Red Hat Directory Server CSN behavior

Solution Unverified - Updated -

Environment

Red Hat Directory Server 9.1.2 and later
Red Hat Directory Server 10.1 and later
Red Hat Enterprise Linux 6.9 and later
Red Hat Enterprise Linux 7.3 and later
Red Hat Identity Management 4.4 and later

Issue

Replication fails because of a missing change sequence number (CSN) in the local change log database.

Resolution

The fundamental issues tracked in the upstream tickets
- https://pagure.io/389-ds-base/issue/48995
- https://pagure.io/389-ds-base/issue/48999
- https://pagure.io/389-ds-base/issue/49000
and eliminated a temporary parameter nsds5ReplicaIgnoreMissingChange to always "treating missing CSN as fatal".
The fixes appeared in:
- added 2017-08-01 : RHEL-7 with 389-ds-base-1.3.6.1-16.el7 see https://access.redhat.com/errata/RHBA-2017:2086 was tracked in https://bugzilla.redhat.com/1391700

Note the systems should always use the last provided RHEL updates.

History of the fix:
There was a temporary Red Hat Directory Server configuration parameter called nsds5ReplicaIgnoreMissingChange , that was added to the replication configuration entry.
It was tracked in upstream ticket https://pagure.io/389-ds-base/issue/49020 - do not treat missing csn as fatal
The parameter used to suport the following values:
* never or off: (Default) No alternative CSN is used. A missing CSN causes the replication to fail.
* once or on: A missing CSN is ignored once and an alternative CSN is used. If Directory Server is unable to also locate the alternative CSN, replication fails.
* always: A missing CSN is always ignored and the entry never replicated.

Example to set the nsds5ReplicaIgnoreMissingChange parameter on a Directory Server instance, was:

# ldapmodify -D "cn=Directory Manager" -W -x -p 389 -h server.example.test

dn: cn=*agreement_name*,cn=cn=replica,cn=*suffixDN*,cn=mapping tree,cn=config
changetype: modify
replace: nsds5ReplicaIgnoreMissingChange
nsds5ReplicaIgnoreMissingChange: once
CTL+D 

This setting did not require re-initializing the replication agreement or restarting the server to take effect.

Root Cause

In a Red Hat Directory Server replication environment, the supplier selects a CSN to before sending an update to the replica. If the supplier is unable to locate the CSN in the local change log database, the update cannot be sent and the supplier retries the process later. If the consumer was updated in the meantime by a different supplier, the CSN can be different and replication succeeds. However, if the supplier continuously fails to locate the CSN in the change log, replicating the entry fails.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

3 Comments

The "real" fix is with upstream tickets #48995, #48999 and #49000, and will remove the temporary parameter nsds5ReplicaIgnoreMissingChange , to always treat missing CSN as fatal.

This temporary change was also backported to 389-ds-base-1.2.11 upstream for RHEL 6, this fix can be used on RHDS 9, see upstream ticket 49020 at https://fedorahosted.org/389/ticket/49020

Log sample:

[22/Dec/2016:09:17:50.319854151 -0500] - ERR - agmt="cn=meTo_localhost.localdomain:38942" (localhost:38942) - clcache_load_buffer - Can't locate CSN 585be002000000010000 in the changelog (DB rc=-30988). If replication stops, the consumer may need to be reinitialized.
[22/Dec/2016:09:17:50.321863791 -0500] - ERR - NSMMReplicationPlugin - changelog program - repl_plugin_name_cl - agmt="cn=meTo_localhost.localdomain:38942" (localhost:38942): CSN 585be002000000010000 not found, we aren't as up to date, or we purged
[22/Dec/2016:09:17:50.324031635 -0500] - ERR - NSMMReplicationPlugin - send_updates - agmt="cn=meTo_localhost.localdomain:38942" (localhost:38942): Data required to update replica has been purged from the changelog. If the error persists the replica must be reinitialized.

https://access.redhat.com/errata/RHBA-2016:25950 gives HTTP 404 Not Found error.

This is an unfortunate side effect of when RHN changed into "subscription manager" and some links disappeared. I updated this "old" article, but note there was a temporary parameter nsds5ReplicaIgnoreMissingChange that had been removed in the final fix. Always use the most recent updates, as of 20200129, RHEL-8.1 with RHDS-11, 389-ds-1.4 from 2019-Nov-05 https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/ https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/ https://access.redhat.com/documentation/en-us/red_hat_directory_server/11/pdf/administration_guide/Red_Hat_Directory_Server-11-Administration_Guide-en-US.pdf Thanks for pointing this out! Marc S.