Cluster Services Stop unexpectedly with error: "crit: We were allegedly just fenced"

Solution Verified - Updated -

Issue

Following a fence event instead of rebooting, fenced nodes will stop cluster services and not reboot. Cluster services stay down and manual intervention is required to bring them back up.

$ cat /var/log/messages
-----------------------------------------8<----------------------------------------- 
Oct  9 11:58:16 rhel8-node1 pacemaker-controld[2842]: crit: We were allegedly just fenced by rhel8-node2 for rhel8-node2!
Oct  9 11:58:16 rhel8-node1 pacemakerd[2605]: warning: Shutting cluster down because pacemaker-controld[2842] had fatal failure
Oct  9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Shutting down Pacemaker
Oct  9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-schedulerd
Oct  9 11:58:16 rhel8-node1 pacemaker-schedulerd[2841]: notice: Caught 'Terminated' signal
Oct  9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-attrd
Oct  9 11:58:16 rhel8-node1 pacemaker-attrd[2840]: notice: Caught 'Terminated' signal
Oct  9 11:58:16 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-execd
Oct  9 11:58:16 rhel8-node1 pacemaker-execd[2839]: notice: Caught 'Terminated' signal
Oct  9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-fenced
Oct  9 11:58:18 rhel8-node1 pacemaker-fenced[2838]: notice: Caught 'Terminated' signal
Oct  9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Stopping pacemaker-based
Oct  9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Caught 'Terminated' signal
Oct  9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Disconnected from Corosync
Oct  9 11:58:18 rhel8-node1 pacemaker-based[2837]: notice: Disconnected from Corosync
Oct  9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Shutdown complete
Oct  9 11:58:18 rhel8-node1 pacemakerd[2605]: notice: Shutting down and staying down after fatal error
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [CFG   ] Node 2 was shut down by sysadmin
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Unloading all Corosync service engines.
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [QB    ] withdrawing server sockets
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [QB    ] withdrawing server sockets
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync configuration map access
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [QB    ] withdrawing server sockets
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync configuration service
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [QB    ] withdrawing server sockets
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [QB    ] withdrawing server sockets
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Oct  9 11:58:18 rhel8-node1 corosync[1537]:  [SERV  ] Service engine unloaded: corosync profile loading service

Environment

  • Red Hat Enterprise Linux 7, 8 or 9 (with the High Availability Add-on)
  • Pacemaker Cluster

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content