Why fence_sbd logs the message 'power timeout needs to be greater then sbd message timeout'?

Solution In Progress - Updated -

Environment

  • Red Hat Enterprise Linux Server 8 (RHEL)with the High Availability or Resilient Storage Add-Ons
  • pacemaker cluster
  • fence-agents-*-4.2.1-65.el8 or later

Issue

  • After patching cluster related packages, the fence_sbd device is logging error messages frequently.

    fence_sbd[<PID>]: power timeout needs to be greater then sbd message timeout
    pacemaker-fenced[<PID>]:  warning: fence_sbd[<PID] stderr: [ <`date`> WARNING: power timeout needs to be greater then sbd message timeout ]
    
  • What the message "power timeout needs to be greater then sbd message timeout" means?

  • Should I change the power_timeout value from fence_sbd?

Resolution

A new option called disable-timeout was introduced to all fence devices on the version fence-agents-*-4.2.1-65.el8 or later. The option disable-timeout is true by default, which set the power_timeout option to 0 generating the error each time (default is 60s) that the status of fence_sbd was checked by pacemaker.

Red Hat Enterprise Linux 8

  • The issue is being tracked with bugzilla 1971683: Bug 1971683 - The fence-agent "fence_sbd" spams the log when "disable_timeout" is enabled (RHEL 8 ---). As of Mon, June 14 2021, the status of 1971683 is NEW. There has been no engineer assigned to this bug yet and it is likely in the early stages of investigation.

Workaround

Disable the disable-timeout option after updating the fence device package to solve the issue and include a power_timeout value:

# pcs stonith update <fence-device-name> disable-timeout=false power-timeout=20

This topic is currently in discussion with our software backline engineering to asses if we can set the option disable-timeout to false by default for fence_sbd fence device. To be informed about this, please open a case with Red Hat Support[1].

Root Cause

The option disable-timeout was introduced on fence-agents-sbd-4.2.1-65.el8 or later to help with fencing failures caused by timeout leading to the resources be blocked until fencing succeeds or a manual interaction is done.

See the article What is the 'disable-timeout' option available for fence devices? for more details about this option.

The option by default is set to true, which set to 0 the following timeout options for the fence device: power_timeout, login_timeout and shell_timeout. This is causing an issue with fence_sbd after patching, because it requires the power_timeout option be a bigger value than the sbd_msg_timeout parameter.

The main() of /usr/sbin/fence_sbd contains the following details which expects the power_timeout value to be bigger than the sbd_msg_timeout (10 seconds by default):

 # grep "greater then sbd message timeout" /usr/sbin/fence_sbd  -B 6
    # we check against the defined timeouts. If the pacemaker timeout is smaller
    # then that defined within sbd we should report this.
    power_timeout = int(options["--power-timeout"])
    sbd_msg_timeout = get_msg_timeout(options)
    if power_timeout <= sbd_msg_timeout:
        logging.warn("power timeout needs to be \
                greater then sbd message timeout")

[1] How do I open and manage a support case on the Customer Portal?

Diagnostic Steps

Check the /var/log/messages file for a message similar to the following:

fence_sbd[<PID>]: power timeout needs to be greater then sbd message timeout
pacemaker-fenced[<PID>]:  warning: fence_sbd[<PID] stderr: [ <`date`> WARNING: power timeout needs to be greater then sbd message timeout ]

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments