Why fence_sbd logs the message 'power timeout needs to be greater then sbd message timeout'?
Environment
- Red Hat Enterprise Linux Server 8 (RHEL)with the High Availability or Resilient Storage Add-Ons
pacemaker
clusterfence-agents-*-4.2.1-65.el8
or later
Issue
-
After patching cluster related packages, the
fence_sbd
device is logging error messages frequently.fence_sbd[<PID>]: power timeout needs to be greater then sbd message timeout pacemaker-fenced[<PID>]: warning: fence_sbd[<PID] stderr: [ <`date`> WARNING: power timeout needs to be greater then sbd message timeout ]
-
What does the message
power timeout needs to be greater then sbd message timeout"
mean? - Should I change the
power_timeout
value fromfence_sbd
?
Resolution
A new option called disable-timeout
was introduced to all fence devices on the version fence-agents-*-4.2.1-65.el8
or later. The option disable-timeout
is true by default, which set the power_timeout
option to 0 generating the error each time (default is 60s) that the status of fence_sbd
was checked by pacemaker
.
Red Hat Enterprise Linux 8
- The issue (bugzilla bug: 1971683) has been resolved with the errata RHBA-2022:1757 with the following package(s):
fence-agents-4.2.1-89.el8
or later.
Workaround
Disable the disable-timeout
option after updating the fence device package to solve the issue and include a power_timeout
value:
# pcs stonith update <fence-device-name> disable_timeout=false power_timeout=20
This topic is currently in discussion with our software backline engineering to asses if we can set the option disable-timeout
to false by default for fence_sbd
fence device. To be informed about this, please open a case with Red Hat Support[1].
Root Cause
The option disable-timeout
was introduced on fence-agents-sbd-4.2.1-65.el8
or later to help with fencing failures caused by timeout leading to the resources be blocked until fencing succeeds or a manual interaction is done.
See the article What is the 'disable-timeout' option available for fence devices? for more details about this option.
The option by default is set to true, which set to 0 the following timeout options for the fence device: power_timeout
, login_timeout
and shell_timeout
. This is causing an issue with fence_sbd
after patching, because it requires the power_timeout
option be a bigger value than the sbd_msg_timeout
parameter.
The main()
of /usr/sbin/fence_sbd
contains the following details which expects the power_timeout
value to be bigger than the sbd_msg_timeout
(10 seconds by default):
# grep "greater then sbd message timeout" /usr/sbin/fence_sbd -B 6
# we check against the defined timeouts. If the pacemaker timeout is smaller
# then that defined within sbd we should report this.
power_timeout = int(options["--power-timeout"])
sbd_msg_timeout = get_msg_timeout(options)
if power_timeout <= sbd_msg_timeout:
logging.warn("power timeout needs to be \
greater then sbd message timeout")
Diagnostic Steps
Check the /var/log/messages
file for a message similar to the following:
fence_sbd[<PID>]: power timeout needs to be greater then sbd message timeout
pacemaker-fenced[<PID>]: warning: fence_sbd[<PID] stderr: [ <`date`> WARNING: power timeout needs to be greater then sbd message timeout ]
[1] How do I open and manage a support case on the Customer Portal?
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments