When a Pacemaker cluster node tries to fence itself and no other nodes are online, only stonith level 1 is attempted

Solution In Progress - Updated -

Issue

  • During pcs cluster stop --all, one node shuts down successfully. The other node fails to stop a resource but does not get fenced.
  • If node1 is the only node online and tries to fence itself, it only tries the level 1 stonith device. If stonith level 1 fails, it is retried repeatedly, and level 2 is never tried.
[root@fastvm-rhel-8-0-23 ~]# pcs status
...
Node fastvm-rhel-8-0-24: OFFLINE (standby)
Online: [ fastvm-rhel-8-0-23 ]
...

[root@fastvm-rhel-8-0-23 ~]# pcs stonith config
 Resource: xvm (class=stonith type=fence_xvm)
  Attributes: pcmk_delay_max=10s pcmk_host_map=fastvm-rhel-8-0-23:fastvm-rhel-8.0-23;fastvm-rhel-8-0-24:fastvm-rhel-8.0-24
  Operations: monitor interval=60s (xvm-monitor-interval-60s)
 Resource: kdump (class=stonith type=fence_kdump)
  Attributes: pcmk_host_list="fastvm-rhel-8-0-23 fastvm-rhel-8-0-24" pcmk_monitor_action=metadata pcmk_reboot_action=off
  Operations: monitor interval=60s (kdump-monitor-interval-60s)
 Target: fastvm-rhel-8-0-23
   Level 1 - kdump
   Level 2 - xvm
 Target: fastvm-rhel-8-0-24
   Level 1 - kdump
   Level 2 - xvm


[root@fastvm-rhel-8-0-23 ~]# tail -f /var/log/messages
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Result of stop operation for dummy on fastvm-rhel-8-0-23: 1 (unknown error)
...
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-schedulerd[1714]: warning: Calculated transition 4 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-112.bz2
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Requesting fencing (reboot) of node fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Client pacemaker-controld.1715.bb47a777 wants to fence (reboot) 'fastvm-rhel-8-0-23' with device '(any)'
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: No alternate host available to handle complex self fencing request
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Peer[1] fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Requesting peer fencing (reboot) of fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: kdump is eligible to fence (reboot) fastvm-rhel-8-0-23: static-list
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: xvm is eligible to fence (reboot) fastvm-rhel-8-0-23 (aka. 'fastvm-rhel-8.0-23'): static-list
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: Agent 'fence_kdump' does not advertise support for 'reboot', performing 'off' action instead
Jun  9 18:33:19 fastvm-rhel-8-0-23 fence_kdump[1802]: waiting for message from '192.168.22.23'
...
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1 process (PID 1802) timed out
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1:1802 - timed out after 60000ms
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation 'reboot' [1802] (call 3 from pacemaker-controld.1715) for host 'fastvm-rhel-8-0-23' with device 'kdump' returned: -62 (Timer expired)
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 4 aborted: Stonith failed
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Peer fastvm-rhel-8-0-23 was not terminated (reboot) by fastvm-rhel-8-0-23 on behalf of pacemaker-controld.1715: Timer expired
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 4 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-112.bz2): Complete
...
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-schedulerd[1714]: warning: Calculated transition 5 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-112.bz2
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Requesting fencing (reboot) of node fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Client pacemaker-controld.1715.bb47a777 wants to fence (reboot) 'fastvm-rhel-8-0-23' with device '(any)'
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: No alternate host available to handle complex self fencing request
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Peer[1] fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Requesting peer fencing (reboot) of fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: kdump is eligible to fence (reboot) fastvm-rhel-8-0-23: static-list
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: xvm is eligible to fence (reboot) fastvm-rhel-8-0-23 (aka. 'fastvm-rhel-8.0-23'): static-list
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: Agent 'fence_kdump' does not advertise support for 'reboot', performing 'off' action instead
Jun  9 18:34:19 fastvm-rhel-8-0-23 fence_kdump[1817]: waiting for message from '192.168.22.23'
...
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1 process (PID 1817) timed out
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1:1817 - timed out after 60000ms
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation 'reboot' [1817] (call 4 from pacemaker-controld.1715) for host 'fastvm-rhel-8-0-23' with device 'kdump' returned: -62 (Timer expired)
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation reboot of fastvm-rhel-8-0-23 by fastvm-rhel-8-0-23 for pacemaker-controld.1715@fastvm-rhel-8-0-23.22c7dfe5: Timer expired
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 5 aborted: Stonith failed
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Peer fastvm-rhel-8-0-23 was not terminated (reboot) by fastvm-rhel-8-0-23 on behalf of pacemaker-controld.1715: Timer expired
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 5 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-112.bz2): Complete

Environment

  • Red Hat Enterprise Linux 7 or 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In