When a Pacemaker cluster node tries to fence itself and no other nodes are online, only stonith level 1 is attempted

Solution In Progress - Updated -

Issue

  • During pcs cluster stop --all, one node shuts down successfully. The other node fails to stop a resource but does not get fenced.
  • If node1 is the only node online and tries to fence itself, it only tries the level 1 stonith device. If stonith level 1 fails, it is retried repeatedly, and level 2 is never tried.
[root@fastvm-rhel-8-0-23 ~]# pcs status
...
Node fastvm-rhel-8-0-24: OFFLINE (standby)
Online: [ fastvm-rhel-8-0-23 ]
...

[root@fastvm-rhel-8-0-23 ~]# pcs stonith config
 Resource: xvm (class=stonith type=fence_xvm)
  Attributes: pcmk_delay_max=10s pcmk_host_map=fastvm-rhel-8-0-23:fastvm-rhel-8.0-23;fastvm-rhel-8-0-24:fastvm-rhel-8.0-24
  Operations: monitor interval=60s (xvm-monitor-interval-60s)
 Resource: kdump (class=stonith type=fence_kdump)
  Attributes: pcmk_host_list="fastvm-rhel-8-0-23 fastvm-rhel-8-0-24" pcmk_monitor_action=metadata pcmk_reboot_action=off
  Operations: monitor interval=60s (kdump-monitor-interval-60s)
 Target: fastvm-rhel-8-0-23
   Level 1 - kdump
   Level 2 - xvm
 Target: fastvm-rhel-8-0-24
   Level 1 - kdump
   Level 2 - xvm


[root@fastvm-rhel-8-0-23 ~]# tail -f /var/log/messages
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Result of stop operation for dummy on fastvm-rhel-8-0-23: 1 (unknown error)
...
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-schedulerd[1714]: warning: Calculated transition 4 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-112.bz2
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Requesting fencing (reboot) of node fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Client pacemaker-controld.1715.bb47a777 wants to fence (reboot) 'fastvm-rhel-8-0-23' with device '(any)'
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: No alternate host available to handle complex self fencing request
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Peer[1] fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Requesting peer fencing (reboot) of fastvm-rhel-8-0-23
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: kdump is eligible to fence (reboot) fastvm-rhel-8-0-23: static-list
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: xvm is eligible to fence (reboot) fastvm-rhel-8-0-23 (aka. 'fastvm-rhel-8.0-23'): static-list
Jun  9 18:33:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: Agent 'fence_kdump' does not advertise support for 'reboot', performing 'off' action instead
Jun  9 18:33:19 fastvm-rhel-8-0-23 fence_kdump[1802]: waiting for message from '192.168.22.23'
...
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1 process (PID 1802) timed out
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1:1802 - timed out after 60000ms
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation 'reboot' [1802] (call 3 from pacemaker-controld.1715) for host 'fastvm-rhel-8-0-23' with device 'kdump' returned: -62 (Timer expired)
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 4 aborted: Stonith failed
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Peer fastvm-rhel-8-0-23 was not terminated (reboot) by fastvm-rhel-8-0-23 on behalf of pacemaker-controld.1715: Timer expired
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 4 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-112.bz2): Complete
...
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-schedulerd[1714]: warning: Calculated transition 5 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-112.bz2
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Requesting fencing (reboot) of node fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Client pacemaker-controld.1715.bb47a777 wants to fence (reboot) 'fastvm-rhel-8-0-23' with device '(any)'
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: No alternate host available to handle complex self fencing request
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Peer[1] fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: Requesting peer fencing (reboot) of fastvm-rhel-8-0-23
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: kdump is eligible to fence (reboot) fastvm-rhel-8-0-23: static-list
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: notice: xvm is eligible to fence (reboot) fastvm-rhel-8-0-23 (aka. 'fastvm-rhel-8.0-23'): static-list
Jun  9 18:34:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: Agent 'fence_kdump' does not advertise support for 'reboot', performing 'off' action instead
Jun  9 18:34:19 fastvm-rhel-8-0-23 fence_kdump[1817]: waiting for message from '192.168.22.23'
...
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1 process (PID 1817) timed out
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: warning: fence_kdump_off_1:1817 - timed out after 60000ms
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation 'reboot' [1817] (call 4 from pacemaker-controld.1715) for host 'fastvm-rhel-8-0-23' with device 'kdump' returned: -62 (Timer expired)
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-fenced[1711]: error: Operation reboot of fastvm-rhel-8-0-23 by fastvm-rhel-8-0-23 for pacemaker-controld.1715@fastvm-rhel-8-0-23.22c7dfe5: Timer expired
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 5 aborted: Stonith failed
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Peer fastvm-rhel-8-0-23 was not terminated (reboot) by fastvm-rhel-8-0-23 on behalf of pacemaker-controld.1715: Timer expired
Jun  9 18:35:19 fastvm-rhel-8-0-23 pacemaker-controld[1715]: notice: Transition 5 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-warn-112.bz2): Complete

Environment

  • Red Hat Enterprise Linux 7 or 8 (with the High Availability Add-on)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content