A stonith reboot action retries one time fewer than pcmk_reboot_retries

Solution Verified - Updated -

Issue

  • A stonith device's action (e.g., reboot, off, monitor) retries one time fewer than the configured value.
  • The pcmk_reboot_retries attribute is not honored.
  • pcmk_off_retries is set to 4, but the off action is only retried 3 times.
[root@fastvm-rhel-7-6-21 sos]# pcs stonith show kdump
 Resource: kdump (class=stonith type=fence_kdump)
  Attributes: pcmk_host_list=node2 pcmk_off_retries=4 pcmk_reboot_action=off timeout=5
  Operations: monitor interval=60s (kdump-monitor-interval-60s)

[root@fastvm-rhel-7-6-21 sos]# tail -f /var/log/messages
Jun 17 15:19:44 fastvm-rhel-7-6-21 stonith-ng[12562]:  notice: Client stonith_admin.28945.18a8558d wants to fence (reboot) 'node2' with device '(any)'
Jun 17 15:19:44 fastvm-rhel-7-6-21 stonith-ng[12562]:  notice: Requesting peer fencing (reboot) of node2
Jun 17 15:19:44 fastvm-rhel-7-6-21 stonith-ng[12562]:  notice: kdump can fence (reboot) node2: static-list
Jun 17 15:19:44 fastvm-rhel-7-6-21 stonith-ng[12562]:  notice: kdump can fence (reboot) node2: static-list
Jun 17 15:19:44 fastvm-rhel-7-6-21 stonith-ng[12562]: warning: Agent 'fence_kdump' does not advertise support for 'reboot', performing 'off' action instead
Jun 17 15:19:44 fastvm-rhel-7-6-21 fence_kdump[28946]: waiting for message from '192.168.22.22'
Jun 17 15:19:49 fastvm-rhel-7-6-21 fence_kdump[28946]: timeout after 5 seconds
Jun 17 15:19:50 fastvm-rhel-7-6-21 fence_kdump[28947]: waiting for message from '192.168.22.22'
Jun 17 15:19:55 fastvm-rhel-7-6-21 fence_kdump[28947]: timeout after 5 seconds
Jun 17 15:19:56 fastvm-rhel-7-6-21 fence_kdump[28948]: waiting for message from '192.168.22.22'
Jun 17 15:20:01 fastvm-rhel-7-6-21 fence_kdump[28948]: timeout after 5 seconds
Jun 17 15:20:02 fastvm-rhel-7-6-21 fence_kdump[28949]: waiting for message from '192.168.22.22'
Jun 17 15:20:07 fastvm-rhel-7-6-21 fence_kdump[28949]: timeout after 5 seconds
Jun 17 15:20:07 fastvm-rhel-7-6-21 stonith-ng[12562]:   error: Operation 'reboot' [28949] (call 2 from stonith_admin.28945) for host 'node2' with device 'kdump' returned: -61 (No data available)

Environment

  • Red Hat Enterprise Linux 6, 7, or 8 (with the High Availability Add-on)
  • Pacemaker

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content