All nodes in a a RHEL 5 or 6 cluster were powered off or fenced when using a quorum disk and a "last man standing" configuration
Issue
- During routine operation, we encountered a situation where some cluster node fencing was done, and ultimately all of the nodes were shut down by the clustering software.
- Two nodes in a 3-node cluster can race to fence each other while the 3rd node is down
- In a last-man standing quorum-disk configuration, one node was fenced by two different nodes simultaneously, resulting in that node never powering back on.
Mar 12 05:49:34 node1 fenced[11075]: fencing node "node3.example.com"
Mar 12 05:49:39 node2 fenced[11044]: fencing node "node3.example.com"
Environment
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.