When a RHEL 6.1 cluster node reboots and rejoins before it has been fenced, the rest of the cluster does not recover (dlm, fenced, rgmanager stuck)
Environment
- Red Hat Enterprise Linux (RHEL) 6.1 with the High Availability Add-On
- cman/clusterlib 3.0.12-41.el6_1
Issue
-
When a node reboots on its own (power failure or other reason) and attempts to rejoin the cluster before token timeout has expired or the node has been fenced, that node's corosync will be killed (not allowed back into cluster) and the rgmanager servicers do not relocate automatically.
-
When a cluster node rebooted (not due to fencing), the services did not fail over the standby node
Resolution
- Update to cman/clusterlib 3.0.12.1-23.el6 or later.
-
Alternative workaround: set the totem token in /etc/cluster/cluster.conf to a value less than the time it takes for a node to boot, so that it cannot rejoin the cluster before its token has expired. For example (value is in milliseconds):
<totem token="10000"/>
Root Cause
This issue was investigated and resolved via Red Hat Bugzilla #663397.
In the case when a node rejoins before the token has expired for it, cman delivers node down and node up events for it back to back in quick succession. The daemons (fenced, dlm_controld, gfs_controld) get a cman callback indicating the member state has changed for each event, and then query the membership to see what's changed. Sometimes, the first member query already includes the node addition, so they don't detect that the node left and rejoined via cman/quorum (they do via cpg). The cman/quorum events drive a part of the recovery process, which never gets done.
The result of this in dlm_controld is that the /sys/kernel/config/dlm/cluster/comms/<nodeid> dir is not removed, so the old dlm connection is not closed, a new one is not created, so the recovery messages go nowhere, and dlm recovery gets stuck, which means userland daemons get stuck joining their lockspace.
The solution in the daemons is fairly simple: compare the cluster incarnation numbers that cman is already returning, and if they are different then the node left and rejoined between membership queries.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments