All nodes in a a RHEL 5 or 6 cluster were powered off or fenced when using a quorum disk and a "last man standing" configuration

Solution Unverified - Updated -

Issue

  • During routine operation, we encountered a situation where some cluster node fencing was done, and ultimately all of the nodes were shut down by the clustering software.
  • Two nodes in a 3-node cluster can race to fence each other while the 3rd node is down
  • In a last-man standing quorum-disk configuration, one node was fenced by two different nodes simultaneously, resulting in that node never powering back on.
  Mar 12 05:49:34 node1 fenced[11075]: fencing node "node3.example.com"
  Mar 12 05:49:39 node2 fenced[11044]: fencing node "node3.example.com"

Environment

  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
  • Cluster utilizing a quorum device (<quorumd> in /etc/cluster/cluster.conf)
    • Cluster setup for a "last man standing" ([1], [2], [3]) configuration.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content