15.24. Solving Common Replication Conflicts

Multi-master replication uses an eventually-consistency replication model. This means that the same entries can be changed on different servers. When replication occurs between these two servers, the conflicting changes need to be resolved. Mostly, resolution occurs automatically, based on the time stamp associated with the change on each server. The most recent change takes precedence.
However, there are some cases where conflicts require manual intervention in order to reach a resolution. Entries with a change conflict that cannot be resolved automatically by the replication process.
To list conflict entries, enter:
# dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict list dc=example,dc=com

15.24.1. Solving Naming Conflicts

When two entries are created with the same DN on different servers, the automatic conflict resolution procedure during replication renames the last entry created, including the entry's unique identifier in the DN. Every directory entry includes a unique identifier stored in the nsuniqueid operational attribute. When a naming conflict occurs, this unique ID is appended to the non-unique DN.
For example, if the uid=user_name,ou=People,dc=example,dc=com entry was created on two different servers, replication adds the unique ID to the DN of the last entry created. This means, the following entries exist:
  • uid=user_name,dc=example,dc=com
  • nsuniqueid=66446001-1dd211b2+uid=user_name,dc=example,dc=com
To resolve the replication conflict, you must manually decide how to proceed:
  • To keep only the valid entry (uid=user_name,dc=example,dc=com) and delete the conflict entry, enter:
    # dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict delete nsuniqueid=66446001-1dd211b2+uid=user_name,dc=example,dc=com
  • To keep only the conflict entry (nsuniqueid=66446001-1dd211b2+uid=user_name,dc=example,dc=com), enter:
    # dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict swap nsuniqueid=66446001-1dd211b2+uid=user_name,dc=example,dc=com
  • To keep both entries, enter:
    # dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict convert --new-rdn=uid=user_name_NEW nsuniqueid=66446001-1dd211b2+uid=user_name,dc=example,dc=com
    To keep the conflict entry, you must specify a new Relative Distinguished Name (RDN) in the --new-rdn option to rename the conflict entry. The previous command renames the conflict entry to uid=user_name_NEW,dc=example,dc=com.

15.24.2. Solving Orphan Entry Conflicts

When a delete operation is replicated and the consumer server finds that the entry to be deleted has child entries, the conflict resolution procedure creates a glue entry to avoid having orphaned entries in the directory.
In the same way, when an add operation is replicated and the consumer server cannot find the parent entry, the conflict resolution procedure creates a glue entry representing the parent so that the new entry is not an orphan entry.
Glue entries are temporary entries that include the object classes glue and extensibleObject. Glue entries can be created in several ways:
  • If the conflict resolution procedure finds a deleted entry with a matching unique identifier, the glue entry is a resurrection of that entry, with the addition of the glue object class and the nsds5ReplConflict attribute.
    In such cases, either modify the glue entry to remove the glue object class and the nsds5ReplConflict attribute to keep the entry as a normal entry or delete the glue entry and its child entries.
  • The server creates a minimalistic entry with the glue and extensibleObject object classes.
In such cases, modify the entry to turn it into a meaningful entry or delete it and all of its child entries.
To list all glue entries:
# dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict list-glue suffix
To delete a glue entry and its child entries:
# dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict delete-glue DN_of_glue_entry
To convert a glue entry into a regular entry:
# dsconf -D "cn=Directory Manager" ldap://server.example.com repl-conflict convert-glue DN_of_glue_entry

15.24.3. Resolving Errors for Obsolete or Missing Suppliers

Information about the replication topology, that is all suppliers which supply updates to each other and other replicas within the same replication group, is contained in a set of metadata called the replica update vector (RUV). The RUV contains information about the supplier such as its ID and URL, its latest change state number (CSN) on the local server, and the CSN of the first change. Both suppliers and consumers store RUV information, and they use it to control replication updates.
When one supplier is removed from the replication topology, it may remain in another replica's RUV. When the other replica is restarted, it can record errors in its log, warning that the replication plug-in does not recognize the removed supplier. The errors will look similar to the following example:
[22/Jan/2020:17:16:01 -0500] NSMMReplicationPlugin - ruv_compare_ruv: RUV [changelog max RUV] does not contain element [{replica 8 ldap://m2.example.com:389} 4aac3e59000000080000 4c6f2a02000000080000] which is present in RUV [database RUV]

<...several more samples...>

[22/Jan/2020:17:16:01 -0500] NSMMReplicationPlugin - replica_check_for_data_reload: Warning: for replica dc=example,dc=com there were some differences between the changelog max RUV and the database RUV. If there are obsolete elements in the database RUV, you should remove them using the CLEANALLRUV task. If they are not obsolete, you should check their status to see why there are no changes from those servers in the changelog.
Note which replica and its ID; in this case, replica 8.
When the supplier is permanently removed from the topology, then any lingering metadata about that supplier should be purged from every other supplier's RUV entry. Use the cleanallruv directory task to remove a RUV entry from all suppliers in the topology.

Note

The cleanallruv task is replicated. Therefore, you only need to run it on one master.

Procedure 15.1. Removing an Obsolete or Missing Supplier Using the cleanallruv Task Operation

  1. List all RUV records and replica IDs, both valid and invalid, as deleted masters may have left metadata on other masters:
    # ldapsearch -o ldif-wrap=no -xLLL -H m1.example.com -D "cn=Directory Manager" -W -b dc=example,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' nsDS5ReplicaId nsDS5ReplicaType nsds50ruv
    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
    nsDS5ReplicaId: 1
    nsDS5ReplicaType: 3
    nsds50ruv: {replicageneration} 55d5093a000000010000
    nsds50ruv: {replica 1 ldap://m1.example.com:389} 55d57026000000010000 55d57275000000010000
    nsds50ruv: {replica 20 ldap://m2.example.com:389} 55e74b8c000000140000 55e74bf7000000140000
    nsds50ruv: {replica 9 ldap://m2.example.com:389}
    nsds50ruv: {replica 8 ldap://m2.example.com:389} 506f921f000000080000 50774211000500080000
    
    Note the returned replica IDs: 1, 20, 9, and 8.
  2. List the currently defined and valid replica IDs of all masters which are replicating databases by searching the replica configuration entries DN cn=replica under the cn=config suffix.

    Note

    Consumers and read-only nodes always have the replica ID set to 65535, and nsDS5ReplicaType: 3 signifies a master.
    # ldapsearch -o ldif-wrap=no -xLLL -H m1.example.com m2.example.com -D "cn=Directory Manager" -W -b cn=config cn=replica nsDS5ReplicaId nsDS5ReplicaType
    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
    nsDS5ReplicaId: 1
    nsDS5ReplicaType: 3
    
    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
    nsDS5ReplicaId: 20
    nsDS5ReplicaType: 3
    
    After you search all URIs returned in the first step (in this procedure, m1.example.com and m2.example.com), compare the list of returned masters (entries which have nsDS5ReplicaType: 3) to the list of RUVs from the previous step. In the above example, this search only returned IDs 1 and 20, but the previous search also returned 9 and 8 on URI m2.example.com. This means that the latter two are removed, and their RUVs need to be cleaned.
  3. After determining which RUVs require cleaning, create a new cn=cleanallruv,cn=tasks,cn=config entry and provide the following information about your replication configuration:
    • The base DN of the replicated database (replica-base-dn)
    • The replica ID (replica-id)
    • Whether to catch up to the maximum change state number (CSN) from the missing supplier, or whether to just remove all RUV entries and miss any updates (replica-force-cleaning); setting this attribute to no means that the task will wait for all the configured replicas to catch up with all the changes from the removed replica first, and then remove the RUV.
    # dsconf -D "cn=Directory Manager" ldap://m2.example.com repl-tasks \
         cleanallruv --suffix="dc=example,dc=com" --replica-id=8

    Note

    The cleanallruv task is replicated. Therefore, you only need to run it on one master.
    Repeat the same for every RUV you want to clean (ID 9 in this procedure).
  4. After cleaning the RUVs of all replicas discovered earlier, you can again use the search from the first step to verify that all extra RUVs are removed:
    # ldapsearch -o ldif-wrap=no -xLLL -H m1.example.com -D "cn=Directory Manager" -W -b dc=example,dc=com '(&(nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff)(objectclass=nstombstone))' nsDS5ReplicaId nsDS5ReplicaType nsds50ruv
    dn: cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config
    nsDS5ReplicaId: 1
    nsDS5ReplicaType: 3
    nsds50ruv: {replicageneration} 55d5093a000000010000
    nsds50ruv: {replica 1 ldap://m1.example.com:389} 55d57026000000010000 55d57275000000010000
    nsds50ruv: {replica 20 ldap://m2.example.com:389} 55e74b8c000000140000 55e74bf7000000140000
    
    As you can see in the above output, replica IDs 8 and 9 are no longer present, which indicates that their RUVs have been cleaned successfully.