Cluster Services Stop unexpectedly with error: "crit: We were allegedly just fenced"
Issue
Following a fence event instead of rebooting, fenced nodes will stop cluster services and not reboot. Cluster services stay down and manual intervention is required to bring them back up.
$ cat /var/log/messages
-----------------------------------------8<-----------------------------------------
Oct 9 11:58:16 rhel8-node1 pacemaker-controld[2842]: crit: We were allegedly just fenced by rhel8-node2 for rhel8-node2!
Oct 9 11:58:16 rhel8-node1 pacemakerd[2605]: warning: Shutting cluster down because pacemaker-controld[2842] had fatal failure
Oct 9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Shutting down Pacemaker
Oct 9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-schedulerd
Oct 9 11:58:16 rhel8-node1 pacemaker-schedulerd[2841]: notice: Caught 'Terminated' signal
Oct 9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-attrd
Oct 9 11:58:16 rhel8-node1 pacemaker-attrd[2840]: notice: Caught 'Terminated' signal
Oct 9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-execd
Oct 9 11:58:16 rhel8-node1 pacemaker-execd[2839]: notice: Caught 'Terminated' signal
Oct 9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-fenced
Oct 9 11:58:18 rhel8-node1 pacemaker-fenced[2838]: notice: Caught 'Terminated' signal
Oct 9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-based
Oct 9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Caught 'Terminated' signal
Oct 9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Disconnected from Corosync
Oct 9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Disconnected from Corosync
Oct 9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Shutdown complete
Oct 9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Shutting down and staying down after fatal error
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [CFG ] Node 2 was shut down by sysadmin
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Unloading all Corosync service engines.
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [QB ] withdrawing server sockets
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync vote quorum service v1.0
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [QB ] withdrawing server sockets
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync configuration map access
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [QB ] withdrawing server sockets
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync configuration service
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [QB ] withdrawing server sockets
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [QB ] withdrawing server sockets
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Oct 9 11:58:18 rhel8-node1 corosync[1537]: [SERV ] Service engine unloaded: corosync profile loading service
Environment
- Red Hat Enterprise Linux 7, 8 or 9 (with the High Availability Add-on)
- Pacemaker Cluster
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.