Compute or remote nodes get fenced following network interface down in OpenStack HA Instance

Solution Verified - Updated -

Environment

  • Red Hat OpenStack Platform HA Instance
  • Red Hat Enterprise Linux (RHEL) 8, 9, 10 with the High Availability Add-On
  • Additional fence_kdump configuration

Issue

Following the loss of corosync's main network interface on a controller node ( either due to hardware failure or manually bringing a network interface down ), the compute ( remote ) nodes additionally attached to this controller are also fenced:

$ cat controller2/var/log/messages
-----------------------------------------8<----------------------------------------- 
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute1 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute1 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute1 for fencing
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute3 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute3 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute3 for fencing
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute4 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute4 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute4 for fencing
-----------------------------------------8<----------------------------------------- 
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Unclean nodes will not be fenced until quorum is attained or no-quorum-policy is set to ignore
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) galera-bundle-2 (resource: galera-bundle-podman-2) 'guest is unclean'
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute4 'compute-unfence-trigger:3 is thought to be active there' <---
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute3 'compute-unfence-trigger:2 is thought to be active there' <----
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute1 'compute-unfence-trigger:1 is thought to be active there' <----

Resolution

This issue occurs due to a race condition where the stonith of the controller node, is slower than the attached compute nodes (remote nodes). In most cases, the controller node (member node) should be fenced before the compute nodes (remote nodes), but it is possible to configure the cluster in such a manner where this does not occur.

The only known way that this has occurred is:

  • when the controller nodes (member nodes) and the compute nodes (remote nodes) had different timeout configured for the stonith device fence_kdump.
  • the controller nodes (member nodes) were doing a graceful shutdown when they were fenced instead of an "immediate power off".

To avoid this issue then configure the controller nodes (member nodes) and the compute nodes (remote nodes) in the following manner.


A new option called fence-remote-without-quorum has been added to prevent fencing a remote node by a member node without quorum. In certain cluster configurations, pacemaker remote nodes may be fenced, even if the resource managing the node can be restarted on a quorate node.
If you would like to understand the background of introducing more, would you please contact to RedHat via Support Case ?

If the option fence-remote-without-quorum is false, then pacemaker will require a cluster (member) node to have quorum in order to fence a remote node.

     # pcs property set fence-remote-without-quorum=false

The defaults for the option:

  • RHEL 8: fence-remote-without-quorum=true
  • RHEL 9: fence-remote-without-quorum=true
  • RHEL 10 or later: fence-remote-without-quorum=false

Red Hat Enterprise Linux 8

  • This issue RHEL-93222 has been resolved with the errata RHBA-2025:10552 for RHEL 8.4.0.z that provides the following package(s): pacemaker-2.0.5-9.el8_4.10.
  • This issue RHEL-93223 has been resolved with the errata RHBA-2025:22581 for RHEL 8.6.0.z that provides the following package(s): pacemaker-2.1.2-4.el8_6.9.
  • This issue RHEL-93224 has been resolved with the errata RHBA-2025:22059 for RHEL 8.8.0.z that provides the following package(s): pacemaker-2.1.5-9.5.el8_8.
  • This issue RHEL-93220 has been resolved with the errata RHBA-2025:14543 for RHEL 8.10.z that provides the following package(s): pacemaker-2.1.7-5.3.el8_10.

Red Hat Enterprise Linux 9

  • This issue RHEL-93812 has been resolved with the errata RHBA-2025:10535 for RHEL 9.2.0.z that provides the following package(s): pacemaker-2.1.5-9.el9_2.5.
  • This issue RHEL-93813 has been resolved with the errata RHBA-2025:19608 for RHEL 9.4.z that provides the following package(s): pacemaker-2.1.7-5.3.el9_4.
  • This issue RHEL-92513 has been resolved with the errata RHBA-2025:9455 for RHEL 9.6.z that provides the following package(s): pacemaker-2.1.9-1.2.el9_6.

Red Hat Enterprise Linux 10

  • This issue RHEL-101072 has been resolved with the errata RHBA-2025:12860 for RHEL 10.0.z that provides the following package(s): pacemaker-3.0.0-5.1.el10_0.
  • This issue RHEL-86146 has been resolved with the errata RHBA-2025:20454 for RHEL 10.1 that provides the following package(s): pacemaker-3.0.1-3.el10.

Root Cause

This issue occurs due to a race condition where the stonith of the controller node (member node), is slower than the attached compute nodes (remote nodes). In most cases, the controller node (member node) should be fenced before the compute nodes (remote nodes), but it is possible to configure the cluster in such a manner where this does not occur. The delay caused by the controller node (member node) not being fenced before the compute nodes (remote nodes) will cause the compute nodes (remote nodes) to be fenced.

  1. The timeout configured for the fence_kdump stonith device on the controller nodes (member node) is higher than what is configured on the compute nodes (remote nodes).
  2. The controller nodes (member nodes) are not configured to do a "immediate power off".
  3. A controller node (member node) and the compute nodes (remote nodes running on the controller node) need to be fenced (but no panic occurs on any of them). This means that fence_kdump will timeout on all the nodes, before running the next fence-agent.

The end result could be that the compute nodes (remote nodes) are fenced before the controller node (member node).

While a controller node is fencing, any attached nodes are expected to be migrated away to other controller nodes without interruption. In normal circumstances a remote node should not be fenced from a controller node without quorum. However there are special cases in which this event can still be scheduled, and can additionally be triggered if the controller is not fenced first.


The below represents the sequence of events in which this can and will commonly occur:

Below we see the ifdown against corosync's main network interface:

Mar 15 19:17:37 controller2 ifdown[273810]: You are using 'ifdown' script provided by 'network-scripts', which are now deprecated.
Mar 15 19:17:37 controller2 ifdown[273812]: 'network-scripts' will be removed from distribution in near future.
Mar 15 19:17:37 controller2 ifdown[273814]: It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
Mar 15 19:17:37 controller2 NetworkManager[2029]: <info>  [1742066257.6228] audit: op="connections-load" args="/etc/sysconfig/network-scripts/ifcfg-vlan20" pid=273829 uid=0 result="success"
Mar 15 19:17:37 controller2 NetworkManager[2029]: <info>  [1742066257.6441] audit: op="connections-load" args="/etc/sysconfig/network-scripts/ifcfg-vlan20" pid=273838 uid=0 result="success"
Mar 15 19:17:37 controller2 NetworkManager[2029]: <info>  [1742066257.6607] audit: op="connections-load" args="/etc/sysconfig/network-scripts/ifcfg-vlan20" pid=273852 uid=0 result="success"

We get the token loss 10s later:

Mar 15 19:17:47 controller2 corosync[3079]:  [TOTEM ] A processor failed, forming new configuration: token timed out (10650ms), waiting 12780ms for consensus.

The monitors for the computes fail after about 43s, which is about 13s later compared to what we would expect from a 30s timeout if the monitor were to start immediately after the ifdown. Or only 3s if you assume we are waiting for the maximum interval to pass between remote node monitors ( 10s ). Please note with the errors the specific error is "Remote executor did not respond", and this executor is important to monitoring the compute / remote nodes:

Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute1 on controller2: Timed Out after 30s (Remote executor did not respond)
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Lost connection to Pacemaker Remote node compute1
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute-unfence-trigger on compute1: Internal communication failure (Action was pending when executor connection was dropped)
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute4 on controller2: Timed Out after 30s (Remote executor did not respond)
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Lost connection to Pacemaker Remote node compute4
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute-unfence-trigger on compute4: Internal communication failure (Action was pending when executor connection was dropped)
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute3 on controller2: Timed Out after 30s (Remote executor did not respond)
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Lost connection to Pacemaker Remote node compute3
Mar 15 19:18:20 controller2 pacemaker-controld[3427]: error: Result of monitor operation for compute-unfence-trigger on compute3: Internal communication failure (Action was pending when executor connection was dropped)

Please note, ahead of this error the executor appears to be occupied with other errors / monitors ( for the guest resources mostly ) and disconnected following failures of these guest resources. This sequence of events is more or less consistent with all of these events. The remote monitors appear blocked behind the below timeouts, and only complete themselves after these complete. And these errors result in the killing of the executor, thus the guaranteed failure of the remotes 10s later:

Mar 15 19:18:07.117 controller2 pacemakerd          [3340] (pcmk__ipc_is_authentic_process_active)      info: Could not connect to crmd IPC: timeout
Mar 15 19:18:07.117 controller2 pacemakerd          [3340] (check_next_subdaemon)       notice: pacemaker-controld[3427] is unresponsive to ipc after 1 tries
Mar 15 19:18:10.508 controller2 pacemaker-controld  [3427] (pcmk__read_remote_message)  error: Timed out (10000 ms) while waiting for remote data
Mar 15 19:18:10.508 controller2 pacemaker-controld  [3427] (lrmd_tls_send_recv)         error: Disconnecting remote after request 2805 reply not received: Timer expired | rc=62 timeout=120000ms
Mar 15 19:18:10.509 controller2 pacemaker-controld  [3427] (lrmd_tls_connection_destroy)        info: TLS connection destroyed
Mar 15 19:18:10.509 controller2 pacemaker-controld  [3427] (remote_lrm_op_callback)     error: Lost connection to Pacemaker Remote node rabbitmq-bundle-2
Mar 15 19:18:10.509 controller2 pacemaker-controld  [3427] (log_executor_event)         error: Result of monitor operation for rabbitmq-bundle-2 on controller2: Error (Lost connection to remote executor) | CIB update 348, graph action unconfirmed; call=18 key=rabbitmq-bundle-2_monitor_30000
Mar 15 19:18:10.509 controller2 pacemaker-controld  [3427] (lrmd_send_command)  error: Couldn't perform lrmd_rsc_exec operation (timeout=120000): -62: Success (0)
Mar 15 19:18:10.509 controller2 pacemaker-controld  [3427] (lrmd_send_command)  error: Executor disconnected

The above monitor failure still does not cause or trigger a fence event for the computes though. And as you can see below initially it is marked as cannot be fenced:

Mar 15 19:18:21 controller2 pacemaker-fenced[3423]: notice: Node compute3 state is now lost
Mar 15 19:18:21 controller2 pacemaker-fenced[3423]: notice: Node compute4 state is now lost
Mar 15 19:18:21 controller2 pacemaker-fenced[3423]: notice: Node compute1 state is now lost
-----------------------------------------8<----------------------------------------- 
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: Node compute1 is unclean but cannot be fenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: Node compute3 is unclean but cannot be fenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: Node compute4 is unclean but cannot be fenced

The compute unfence operations want to stop but are unable to run:

Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:1_stop_0 on compute1 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:1 until compute1 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:1_stop_0 on compute1 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:1 until compute1 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:2_stop_0 on compute3 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:2 until compute3 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:2_stop_0 on compute3 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:2 until compute3 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:3_stop_0 on compute4 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:3 until compute4 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: compute-unfence-trigger:3_stop_0 on compute4 is unrunnable (node is offline)
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: error: Stopping compute-unfence-trigger:3 until compute4 can be unfenced
Mar 15 19:19:23 controller2 pacemaker-schedulerd[3426]: warning: nova-evacuate_stop_0 on controller3 is unrunnable (node is offline)

Since the above stop was unable to run ( likely again because executor was disconnected ), these stop operations are determined to be failures and compute-unfence-trigger is still considered to be active:

Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Unexpected result (error: Action was pending when executor connection was dropped) was recorded for monitor of compute-unfence-trigger:1 on compute1 at Mar 15 18:57:36 2025
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Remote node compute1 is unclean: compute-unfence-trigger:1 is thought to be active there
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Unexpected result (error: Action was pending when executor connection was dropped) was recorded for monitor of compute-unfence-trigger:2 on compute3 at Mar 15 19:12:39 2025
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Remote node compute3 is unclean: compute-unfence-trigger:2 is thought to be active there
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Unexpected result (error: Action was pending when executor connection was dropped) was recorded for monitor of compute-unfence-trigger:3 on compute4 at Mar 15 19:03:01 2025
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Remote node compute4 is unclean: compute-unfence-trigger:3 is thought to be active there

The failures of the stop operations for the stonith devices are determined to be fence events, even without quorum, so stonith is scheduled for compute nodes following this:

Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute1 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute1 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute1 for fencing
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute3 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute3 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute3 for fencing
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute4 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: We can fence compute4 without quorum because they're in our membership
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: warning: Scheduling node compute4 for fencing
-----------------------------------------8<----------------------------------------- 
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Unclean nodes will not be fenced until quorum is attained or no-quorum-policy is set to ignore
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) galera-bundle-2 (resource: galera-bundle-podman-2) 'guest is unclean'
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute4 'compute-unfence-trigger:3 is thought to be active there' <---
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute3 'compute-unfence-trigger:2 is thought to be active there' <----
Mar 15 19:19:45 controller2 pacemaker-schedulerd[3426]: notice: Actions: Fence (reboot) compute1 'compute-unfence-trigger:1 is thought to be active there' <----

Then fencing is scheduled just after this:

Mar 15 19:19:45 controller2 pacemaker-controld[3427]: notice: Requesting fencing (reboot) of node compute4
Mar 15 19:19:45 controller2 pacemaker-controld[3427]: notice: Requesting fencing (reboot) of node compute3
Mar 15 19:19:45 controller2 pacemaker-controld[3427]: notice: Requesting fencing (reboot) of node compute1
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Client pacemaker-controld.3427 wants to fence (reboot) compute4 using any device
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Requesting peer fencing (off) targeting compute4
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Client pacemaker-controld.3427 wants to fence (reboot) compute3 using any device
Mar 15 19:19:45 controller2 pacemaker-attrd[3425]: notice: Setting fail-count-rabbitmq-bundle-2#monitor_30000[controller2]: 1 -> (unset)
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Requesting peer fencing (off) targeting compute3
Mar 15 19:19:45 controller2 pacemaker-attrd[3425]: notice: Setting last-failure-rabbitmq-bundle-2#monitor_30000[controller2]: 1742066300 -> (unset)
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Client pacemaker-controld.3427 wants to fence (reboot) compute1 using any device
Mar 15 19:19:45 controller2 pacemaker-fenced[3423]: notice: Requesting peer fencing (off) targeting compute1

During stonith, and with this current fence_kdump configuration, kdump stonith is configured to timeout after 180s for the controller and only 30s for the compute nodes. This configuration guarantees that the controller will remain running long enough that it can still fence (For details of controlling fence_kdump timeout please refer to solution-7021305):

$ pcs config
-----------------------------------------8<----------------------------------------- 
Resource: stonith-fence_kdump-901b0eadc4b6 (class=stonith type=fence_kdump)
  Attributes: stonith-fence_kdump-901b0eadc4b6-instance_attributes
    ipport=7410
    pcmk_host_list=controller2
    pcmk_off_retries=1
    pcmk_off_timeout=180
    timeout=180 <--- controller
-----------------------------------------8<----------------------------------------- 
Resource: stonith-fence_kdump-525400603117 (class=stonith type=fence_kdump)
  Attributes: stonith-fence_kdump-525400603117-instance_attributes
    ipport=7410
    pcmk_host_list=compute3
    pcmk_off_retries=1
    pcmk_off_timeout=30
    timeout=30 <--- compute
-----------------------------------------8<----------------------------------------- 
Resource: stonith-fence_kdump-52540027332a (class=stonith type=fence_kdump)
  Attributes: stonith-fence_kdump-52540027332a-instance_attributes
    ipport=7410
    pcmk_host_list=compute4
    pcmk_off_retries=1
    pcmk_off_timeout=30
    timeout=30 <--- compute
-----------------------------------------8<----------------------------------------- 
Resource: stonith-fence_kdump-525400aa6f76 (class=stonith type=fence_kdump)
  Attributes: stonith-fence_kdump-525400aa6f76-instance_attributes
    ipport=7410
    pcmk_host_list=compute1
    pcmk_off_retries=1
    pcmk_off_timeout=30
    timeout=30  <--- compute

So here while stonith is triggering against controller2 and compute4, we can see the fence_kdump

Mar 15 19:20:54.721 controller2 pacemaker-fenced    [3423] (log_async_result)   error: Operation 'off' [284893] targeting compute4 using stonith-fence_kdump-52540027332a could not be executed: Timed Out (Fence agent did not complete within 30s) | call 13 from pacemaker-controld.3427
Mar 15 19:20:54.722 controller2 pacemaker-fenced    [3423] (fenced_process_fencing_reply)       notice: Action 'off' targeting compute4 using stonith-fence_kdump-52540027332a on behalf of pacemaker-controld.3427@controller2: Timed Out (Fence agent did not complete within 30s)

And then power fencing was activated and immediately succeeding, thus beating the stonith action against the controller in a "fence race":

Mar 15 19:20:54.824 controller2 pacemaker-fenced    [3423] (log_async_result)   notice: Operation 'off' [286365] targeting compute4 using stonith-fen
ce_ipmilan-52540027332a returned 0 | call 13 from pacemaker-controld.3427
Mar 15 19:20:54.825 controller2 pacemaker-fenced    [3423] (fenced_process_fencing_reply)       notice: Action 'off' targeting compute4 using stonith-fence_ipmilan-52540027332a on behalf of pacemaker-controld.3427@controller2: complete

The above issue doesn't take place with fence_kdump configured at equal levels for the controller and computes, or if the computes are configured with higher timeouts, compared to the controllers.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments