clvmd, GFS2, rgmanager, or other cluster components are not responsive after a node rebooted on its own and fencing never completed in RHEL 6
Issue
- A node crashed for some reason and rebooted on its own. I see fencing initiated in the logs, but it never completes. The other node rejoined, but GFS2 remained unresponsive even after this.
Mar 25 02:02:42 node1 corosync[2028]: [TOTEM ] A processor failed, forming new configuration.
Mar 25 02:02:44 node1 corosync[2028]: [QUORUM] Members[1]: 1
Mar 25 02:02:44 node1 corosync[2028]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 25 02:02:44 node1 corosync[2028]: [CPG ] chosen downlist: sender r(0) ip(10.193.20.241) ; members(old:2 left:1)
Mar 25 02:02:44 node1 corosync[2028]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 25 02:02:44 node1 rgmanager[3320]: State change: node2 DOWN
Mar 25 02:02:45 node1 kernel: dlm: closing connection to node 2
Mar 25 02:02:45 node1 kernel: dlm: canceled swork for node 2
Mar 25 02:02:45 node1 fenced[2334]: fencing node node2
Mar 25 02:02:45 node1 kernel: GFS2: fsid=myCluster:fs1.1: jid=0: Trying to acquire journal lock...
Mar 25 02:03:13 node1 corosync[2028]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Mar 25 02:03:13 node1 corosync[2028]: [QUORUM] Members[2]: 1 2
Mar 25 02:03:13 node1 corosync[2028]: [QUORUM] Members[2]: 1 2
Mar 25 02:03:13 node1 corosync[2028]: [CPG ] chosen downlist: sender r(0) ip(10.193.20.241) ; members(old:1 left:0)
Mar 25 02:03:13 node1 corosync[2028]: [MAIN ] Completed service synchronization, ready to provide service.
Environment
- Red Hat Enterprise Linux (RHEL) 6 with the Resilient Storage Add On
- GFS2
- A fence action was executed for which no success or failure was logged
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.