Whenever a node in a RHEL High Availability cluster gets fenced, another node stops responding and is also fenced
Issue
- We have a three node cluster and when we pull the network cable for node 2, it gets fenced, and immediately after fencing is initiated we see node 3 become unresponsive and also get fenced
Feb 23 17:52:44 node1 corosync[3149]: [TOTEM ] A processor failed, forming new configuration.
Feb 23 17:52:56 node1 corosync[3149]: [QUORUM] Members[2]: 1 3
Feb 23 17:52:56 node1 corosync[3149]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 23 17:52:56 node1 kernel: dlm: closing connection to node 2
Feb 23 17:52:56 node1 rgmanager[3804]: State change: node2.example.com DOWN
Feb 23 17:52:56 node1 corosync[3149]: [CPG ] chosen downlist: sender r(0) ip(192.168.1.102) ; members(old:3 left:1)
Feb 23 17:52:56 node1 corosync[3149]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 23 17:52:57 node1 fenced[3258]: fencing node node2.example.com
Feb 23 17:53:12 node1 corosync[3149]: [TOTEM ] A processor failed, forming new configuration.
Feb 23 17:53:18 node1 fenced[3258]: fence node2.example.com success
Feb 23 17:53:24 node1 corosync[3149]: [CMAN ] quorum lost, blocking activity
Feb 23 17:53:24 node1 corosync[3149]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Feb 23 17:53:24 node1 corosync[3149]: [QUORUM] Members[1]: 1
Feb 23 17:53:24 node1 corosync[3149]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 23 17:53:24 node1 rgmanager[3804]: #1: Quorum Dissolved
Feb 23 17:53:24 node1 kernel: dlm: closing connection to node 3
Feb 23 17:53:24 node1 corosync[3149]: [CPG ] chosen downlist: sender r(0) ip(192.168.1.102) ; members(old:2 left:1)
Feb 23 17:53:24 node1 corosync[3149]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 23 17:56:56 node1 corosync[3149]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Feb 23 17:56:56 node1 corosync[3149]: [CMAN ] quorum regained, resuming activity
Feb 23 17:56:56 node1 corosync[3149]: [QUORUM] This node is within the primary component and will provide service.
Feb 23 17:56:56 node1 corosync[3149]: [QUORUM] Members[2]: 1 2
Feb 23 17:56:56 node1 corosync[3149]: [QUORUM] Members[2]: 1 2
Feb 23 17:56:56 node1 corosync[3149]: [CPG ] chosen downlist: sender r(0) ip(192.168.1.102) ; members(old:1 left:0)
Feb 23 17:56:56 node1 corosync[3149]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 23 17:57:02 node1 fenced[3258]: fencing node node3.example.com
Feb 23 17:57:12 node1 fenced[3258]: fence node3.example.com success
- Whenever fencing is executed for a node in the cluster, another node seems to power off as well
Environment
- Red Hat Enterprise Linux (RHEL) 5, 6, or 7 with the High Availability Add On
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
