When a RHEL 6.1 cluster node reboots and rejoins before it has been fenced, the rest of the cluster does not recover (dlm, fenced, rgmanager stuck)

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6.1 with the High Availability Add-On
  • cman/clusterlib 3.0.12-41.el6_1

Issue

  • When a node reboots on its own (power failure or other reason) and attempts to rejoin the cluster before token timeout has expired or the node has been fenced, that node's corosync will be killed (not allowed back into cluster) and the rgmanager servicers do not relocate automatically.

  • When a cluster node rebooted (not due to fencing), the services did not fail over the standby node

Resolution

  • Update to cman/clusterlib 3.0.12.1-23.el6 or later.
  • Alternative workaround: set the totem token in /etc/cluster/cluster.conf to a value less than the time it takes for a node to boot, so that it cannot rejoin the cluster before its token has expired.  For example (value is in milliseconds):

    <totem token="10000"/>
    

Root Cause

This issue was investigated and resolved via Red Hat Bugzilla #663397.

In the case when a node  rejoins before the token has expired for it, cman delivers node down and  node up events for it back to back in quick succession.  The daemons  (fenced, dlm_controld, gfs_controld) get a cman callback indicating the  member state has changed for each event, and then query the membership  to see what's changed.  Sometimes, the first member query already  includes the node addition, so they don't detect that the node left and  rejoined via cman/quorum (they do via cpg).  The cman/quorum events  drive a part of the recovery process, which never gets done.

The result of this in dlm_controld is that the  /sys/kernel/config/dlm/cluster/comms/<nodeid> dir is not removed,  so the old dlm connection is not closed, a new one is not created, so  the recovery messages go nowhere, and dlm recovery gets stuck, which  means userland daemons get stuck joining their lockspace.

The solution in the daemons is fairly simple: compare the  cluster incarnation numbers that cman is already returning, and if they  are different then the node left and rejoined between membership  queries.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments