Fencing fails in a RHEL 7, 8, 9 High Availability cluster because systemd initiates a graceful shutdown

Solution Verified - Updated -

Issue

  • fencing fails because systemd-logind handles the "power button" signal and initiates a graceful shutdown instead of powercycling the system.
  • When a node fenced the other, we see that node process a power-button press and starts to shut down. All the while, fencing fails on the other node, seemingly for taking too long
  • Do we need to disable acpi / acpid in RHEL 7 clusters like we did in previous releases?
  • Do I need to do anything in addition to disabling ACPI on RHEL 7 cluster nodes to avoid it softly shutting down? For example:
Aug 13 21:07:22 node01 systemd-logind: Power key pressed. 
Aug 13 21:07:22 node01 systemd-logind: Powering Off...
Aug 13 21:07:22 node01 systemd-logind: System is powering down.
Aug 13 21:07:42 node02 stonith-ng[2803]: notice: log_operation: Operation 'reboot' [3114] for device 'node01-ilo' returned: -62 (Timer expired)
  • A cluster node gracefully rebooted instead of being hard killed on RHEL 7:
Nov  2 10:57:01 node41 stonith-ng[8161]:  notice: Operation reboot of node42 by node42 for crmd.20238@uxplpsgrd03.8b66209c: OK
Nov  2 10:57:01 node42 crmd[20238]:    crit: We were allegedly just fenced by node41 for node42!
Nov  2 10:57:01 node42 stonith-ng[20234]:  notice: Operation reboot of node42 by node41 for crmd.20238@node42.8b66209c: OK
Nov  2 10:57:01 node42 systemd-logind: Power key pressed.
  • A cluster node gracefully rebooted instead of being hard killed on RHEL 8:
Sep 18 16:19:11  rhel8-1 stonith-ng[8161]:  notice: Operation reboot of rhel8-1 by rhel8-2 for crmd.20238@uxplpsgrd03.8b66209c: OK
Sep 18 16:19:11  rhel8-1 crmd[20238]:    crit: We were allegedly just fenced by rhel8-1 for rhel8-2!
Sep 18 16:19:11 rhel8-1 systemd-logind[792]: Session 1 logged out. Waiting for processes to exit.
Sep 18 16:19:11 rhel8-1 systemd-logind[792]: Removed session 1.

Environment

  • Red Hat Enterprise Linux (RHEL) 7, 8, 9 with the High Availability Add-On
  • One or more pacemaker cluster nodes (orpacemakerremote nodes) associated with astonith` device that uses a power-method which connects to a BMC or system-management controller like an iLO, RSA, DRAC, iDRAC, etc.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content