clustat and clusvcadm hang, cluster services stop functioning, and cluster in "wait state: messages" after temporary network split and rejoin in RHEL 6
Issue
- After a short network "split" and recovery in which all nodes see each other leave and then rejoin, the cluster stops functioning
- GFS2 is not accessible after "processor failure" with all or most nodes dropping and then coming back without a full restart or fencing
- Nodes don't get fenced after token loss, number of members dropping to less than quorum, and then nodes rejoining
clustathangs on all nodes- Cluster related commands hang. No output is returning.
- hundreds of sleeping
clustatprocesses on one or more nodes - I see all the nodes go down to 1 remaining member, then come back to full membership, but nothing in the cluster works afterwards
Aug 23 10:33:29 node1 corosync[29454]: [QUORUM] Members[4]: 1 2 3 4
Aug 23 10:33:29 node1 corosync[29454]: [QUORUM] Members[3]: 1 2 3
Aug 23 10:33:29 node1 corosync[29454]: [QUORUM] Members[2]: 1 2
Aug 23 10:33:29 node1 corosync[29454]: [QUORUM] Members[1]: 1
Aug 23 10:33:29 node1 corosync[29454]: [CMAN ] quorum lost, blocking activity
Aug 23 10:33:31 node1 corosync[29454]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Aug 23 10:33:31 node1 corosync[29454]: [QUORUM] Members[2]: 1 2
Aug 23 10:33:31 node1 corosync[29454]: [QUORUM] Members[3]: 1 2 3
Aug 23 10:33:31 node1 corosync[29454]: [QUORUM] Members[4]: 1 2 3 4
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
- Cluster with more than 2 nodes
fence_tool lsorcman_tool servicesreports"wait state messages"and"victim count"> 0
# fence_tool ls -n
fence domain
member count 4
victim count 3
victim now 0
master nodeid 1
wait state messages
members 1 2 3 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
