Fencing fails in a RHEL 7 High Availability cluster because systemd initiates a graceful shutdown

Solution Verified - Updated -

Issue

  • fencing fails because systemd-logind handles the "power button" signal and initiates a graceful shutdown instead of powercycling the system.
  • When a node fenced the other, we see that node process a power-button press and starts to shut down. All the while, fencing fails on the other node, seemingly for taking too long
  • Do we need to disable acpi / acpid in RHEL 7 clusters like we did in previous releases?
  • Do I need to do anything in addition to disabling ACPI on RHEL 7 cluster nodes to avoid it softly shutting down? For example:
Aug 13 21:07:22 node01 systemd-logind: Power key pressed. 
Aug 13 21:07:22 node01 systemd-logind: Powering Off...
Aug 13 21:07:22 node01 systemd-logind: System is powering down.

Aug 13 21:07:42 node02 stonith-ng[2803]: notice: log_operation: Operation 'reboot' [3114] for device 'node01-ilo' returned: -62 (Timer expired)
  • A cluster node gracefully rebooted instead of being hard killed:
Nov  2 10:57:01 node41 stonith-ng[8161]:  notice: Operation reboot of node42 by uxplpsgrd01 for crmd.20238@uxplpsgrd03.8b66209c: OK
Nov  2 10:57:01 node42 crmd[20238]:    crit: We were allegedly just fenced by node41 for node42!
Nov  2 10:57:01 node42 stonith-ng[20234]:  notice: Operation reboot of node42 by node41 for crmd.20238@node42.8b66209c: OK
Nov  2 10:57:01 node42 systemd-logind: Power key pressed.

Environment

  • Red Hat Enterprise Linux (RHEL) 7 with the High Availability Add-On
  • One or more pacemaker cluster nodes (orpacemakerremote nodes) associated with astonith` device that uses a power-method which connects to a BMC or system-management controller like an iLO, RSA, DRAC, iDRAC, etc.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In