Rebooted node rejoining cluster before token loss registered still gets fenced and rgmanager service recovery delayed until subsequent rejoin in RHEL 6

Solution In Progress - Updated -

Issue

  • A node gets rebooted for reasons outside of the cluster (power blip, kernel panic, etc). If it reboots and attempts to start cman before a token loss has been registered on the remaining nodes, that node gets fenced instead of being allowed back in the cluster. Services running on that node are not recovered until it rejoins again.
  • Cluster services do not fail over right away when a node has rebooted on its own and attempted to rejoin the cluster.
  • After a node quickly was removed from the cluster and rejoined, all cluster services became stuck or blocked.

Environment

  • Red Hat Enteprise Linux (RHEL) 6 with the High Availability Add On
  • <totem token="xxxxx"/> set to longer than the amount of time it takes a node to boot and start the cman service
  • <fence_daemon post_fail_delay="yyyyy"/> set to less than the amount of time it takes the cman service to start on a node.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content