One or more nodes is killed in a RHEL 6 cluster after nodes report "fenced[xxxx]: telling cman to remove nodeid 2 from cluster"
Issue
- After a network problem disrupted communications, which were then recovered, one node killed the other:
Feb 6 17:10:50 node1 corosync[1827]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 6 17:10:50 node1 corosync[1827]: [QUORUM] Members[2]: 1 2
Feb 6 17:10:50 node1 corosync[1827]: [QUORUM] Members[2]: 1 2
Feb 6 17:10:50 node1 corosync[1827]: [CPG ] chosen downlist: sender r(0) ip(172.22.0.210) ; members(old:1 left:0)
Feb 6 17:10:50 node1 corosync[1827]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 6 17:10:50 node1 gfs_controld[1959]: receive_start 2:4 add node with started_count 3
Feb 6 17:10:50 node1 dlm_controld[1910]: receive_start 2:4 add node with started_count 3
Feb 6 17:10:50 node1 dlm_controld[1910]: receive_start 2:4 add node with started_count 3
Feb 6 17:10:50 node1 dlm_controld[1910]: receive_start 2:4 add node with started_count 3
Feb 6 17:10:51 node1 rgmanager[2586]: State change: node2 UP
Feb 6 17:10:53 node1 fenced[1884]: telling cman to remove nodeid 2 from cluster
Feb 6 17:10:30 node2 corosync[1984]: cman killed by node 1 because we were killed by cman_tool or other application
Feb 6 17:10:30 node2 fenced[2041]: telling cman to remove nodeid 1 from cluster
- Both nodes in a two node cluster killed each other, reporting "fenced[xxxx]: telling cman to remove nodeid X from cluster"
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
- A similar issue exists in RHEL 5 clusters
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.