Node never fenced after token loss and kill because rejoin without restart in RHEL 5.6

Solution Unverified - Updated -

Issue

  • A token loss occurred and the node was killed when it attempted to rejoin the cluster without a restart, but it was never fenced.
  • gfs_controld on a remaining node in the cluster shows repeated cpg_mcast_joined retries after the removed node was never fenced.
Apr 30 22:04:58 node4 gfs_controld[10877]: cpg_mcast_joined retry 100 MSG_PLOCK
Apr 30 22:04:58 node4 gfs_controld[10877]: cpg_mcast_joined retry 200 MSG_PLOCK
  • rgmanager became stuck waiting for the node to be fenced
Apr 30 22:05:02 node4 clurgmgrd[13935]: <info> Waiting for node #1 to be fenced 
  • group_tool dump does not show groupd ever processing a confchg as expected following the node removal
1335837897 cman: node 1 removed
1335837897 add_recovery_set_cman nodeid 1

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • openais prior to release 0.80.6-28.el5_6.2 (RHEL 5.6), 0.80.6-30.el5_7.1 (RHEL 5.7), or 0.80.6-36.el5 (RHEL 5.8)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.