Support Policies for RHEL High Availability Clusters - sbd and fence_sbd

Overview
Policies

Overview

Applicable Environments

Red Hat Enterprise Linux (RHEL) with the High Availability Add-On

Useful References and Guides

Introduction

This guide offers Red Hat's policies and requirements around the usage of the fencing components sbd and fence_sbd for RHEL High Availability clusters. Users of RHEL High Availability clusters should adhere to these policies in order to be eligible for support from Red Hat with the appropriate product support subscriptions.

Policies

Available/Supported Releases sbd is available and supported for usage in RHEL HA clusters under the following conditions:

RHEL 9: Supported by Red Hat
RHEL 8: Supported by Red Hat
RHEL 7: Supported by Red Hat with RHEL 7.1 or later - using pacemaker-1.1.12-22.el7 or later and sbd-1.2.1-3 or later.
RHEL 6: Supported by Red Hat with RHEL 6.8 or later - using pacemaker-1.1.14-8.el6 or later and sbd-1.2.1-9.el6 or later.

Storage compatibility sbd poison-pill fencing via block-device: Red Hat's typical storage compatibility policies for RHEL High Availability apply to sbd - in that Red Hat does not certify sbd with specific storage solutions. It is up to customer organizations using sbd poison-pill fencing via block-device to ensure their storage solution is adequate for shared access across cluster members through all relevant failure scenarios the cluster may need to endure.

See also: Support policies - Storage compatibility

Storage device requirements for sbd poison-pill fencing via block-device:

A block device must be shared by all nodes of the cluster
All nodes must have write access to the shared device.
The shared block device cannot host a file system or be used by any other component.
The shared block device cannot be managed by LVM or incorporated into an LVM volume group.
The shared block device should not be managed by any RAID, mirroring, or replication that is administered by the cluster members themselves. Any replication employed must be transparent to the hosts.
4 Megabytes in size is sufficient for the shared block device.

Storage replication technologies with sbd poison-pill fencing via block-device: Red Hat does not test specific vendor storage replication technologies that may facilitate cluster-wide access to the block device sbd poison-pill fencing via block-device. While Red Hat does not prevent organizations from using such technologies, it cannot guarantee that sbd will function as expected with all of them. It is important for organizations using these technologies to thoroughly test sbd in conjunction with their chosen solutions, especially through all potential failure scenarios that may create a split between nodes or storage arrays. Storage vendors should be consulted for input on the capabilities of their solutions and how they may perform in connection with RHEL High Availability or sbd.

Support with pacemaker only: sbd is only available and supported for usage when pacemaker is in use. sbd is suitable for usage in RHEL 6 clusters with cman and pacemaker, as well as RHEL 7 or later pacemaker clusters. sbd is not supported for usage with 6 pure-cman clusters that do not use pacemaker.

No support for sbd on pacemaker_remote nodes: sbd is not supported on pacemaker remote nodes - either baremetal or virtual-machine nodes that utilize pacemaker_remote. Prior to RHEL 8.5, to use a watchdog-only SBD configuration, all nodes in the cluster had to use SBD. That prevented using SBD in a cluster where some nodes support it but other nodes (often remote nodes) required some other form of fencing. On RHEL 8.5 or later a cluster that has pacemaker remote nodes can utilize sbd on the cluster nodes (not the remote nodes).

Suitable watchdog timer (WDT) devices: Red Hat does not maintain a list of supported hardware watchdog timer devices that can be used with sbd. Red Hat considers a device to be suitable for usage with sbd if:

The device is not amongst those specifically highlighted in this policy guide as unsupported
That device's driver implements the structures, functions, and functionality that are expected of a watchdog timer device as explained within the kernel documentation. Red Hat would consider this true for drivers shipped within a RHEL-supplied kernel, other than softdog.
The device is guaranteed to abruptly halt the system if the device is not updated within the timeout period specified to the watchdog device using the kernel's watchdog API.
The device can be demonstrated to carry out this halting action under the expected circumstances.

Support for virtualization-emulated watchdogs: Red Hat supports usage of sbd with a libvirt/KVM-provided hardware watchdog using emulated model i6300esb. Red Hat does not provide support or testing for use of any other emulated watchdog devices (e.g. VMWare). However, if there is interest in another model or device on a different virtualization platform, please contact Red Hat Support to communicate this interest.

VMware: There are known limitations for using SBD on VMware and certain configurations where using SBD on VMware is strongly not recommended. For more information then see the following article: Software-Emulated Watchdog Known Limitations.

Software-Emulated Watchdog: This watchdog timer device used by sbd cannot be effectively emulated by software, such as is done by the softdog driver. Such programs operate within the limitations and available resources provided by the kernel, and thus cannot be guaranteed to carry out the necessary halting action if the operating system is malfunctioning or starved of resources.

It is not recommended to use such a configuration within a Production capacity as a failure to properly fence a node can be a cause for data corruption.

It is also notable that the ability for a full Root Cause determination in such a configuration is greatly diminished. In the event that a cluster failure is encountered for such a configuration, Red Hat Support reserves the right to discontinue an investigation at such a time that the combination is strongly suspected to be either directly (or indirectly) the cause.

For further considerations regarding this configuration, please see - Software-Emulated Watchdog Known Limitations

Investigations may require confirmation of a suitable device: Because Red Hat does not maintain a list of suitable devices for usage with sbd and thus may not have extensive knowledge of every device type, questions may arise in the course of investigations as to the suitability of the device in use. If unexpected behavior is observed with sbd, Red Hat may request and require demonstration of the functionality of the watchdog device in use before it can deeply investigate. In situations where the functionality of the device itself is questionable, Red Hat may require the concerning behavior from sbd be reproduced on known hardware, or may require investigation from the vendor of the device.

RHEL 6 with QDisk: RHEL 6 clusters utilizing QDisk are not compatible with or supported with sbd

sbd compatibility with cloud platforms: Red Hat supports sbd or fence_sbd on any of the various cloud-platforms which provides a suitable watchdog timer (do note that softdog is currently supported with the caveats described in the 'Software-Emulated Watchdog' section of this article).

SAP Hana on Azure (Large Instances) are not cloud VMs but rather bare-metal systems managed through the Azure interface. For sbd to be supported on this platform, an IPMI watchdog must be configured and enabled on each system as documented in the the Configure Watchdog section of the Microsoft documentation. For more information then see: Azure Large Instances high availability for SAP on RHEL.

Recommendations when using fence_sbd: Red Hat recommends not specifying a method option with fence_sbd which will cause the fence agent to use the default of cycle for the option method. The fence agent fence_sbd is not a standard power fencing agent and using the cycle method is supported and recommended for fence_sbd. For more information about why you should not use method=cycle for power fencing then see: Resources run on two nodes simultaneously, data on shared storage is corrupted, and/or other unexpected behavior occurs in a RHEL High Availability cluster using fence_ipmilan with method=cycle.

Ansible.com

Red Hat Ecosystem Catalog

Red Hat Hybrid Cloud Console

Red Hat Store

Red Hat Summit and AnsibleFest

Support Policies for RHEL High Availability Clusters - sbd and fence_sbd

Contents

Overview

Applicable Environments

Useful References and Guides

Introduction

Policies

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Contents

Overview

Applicable Environments

Useful References and Guides

Introduction

Policies

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links