Support Policies for RHEL High Availability Clusters - General Requirements for Fencing/STONITH

Updated -

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) with the High Availability Add-On

Useful References and Guides

Introduction

Fencing, also known as STONITH ("Shoot The Other Node In The Head") is a key aspect of a stable High Availability cluster design.

This guide offers Red Hat's policies and requirement around fencing, fence-devices, and STONITH in a RHEL High Availability cluster, including those deployed in conjunction with other products, such as RHEL Resilient Storage, Red Hat Openstack Platform, Red Hat Storage, Red Hat Satellite, and any others. Users of RHEL High Availability clusters should adhere to these policies in order to be eligible for support from Red Hat with the appropriate product support subscriptions.

Policies

STONITH/fencing must be enabled: Anywhere the RHEL High Availability software offers any ability to disable STONITH, fenced, or fencing functionality - Red Hat does not support clusters having their fencing functionality disabled via those mechanisms.

  • pacemaker clusters: Cluster property stonith-enabled=false is not a supported configuration in pacemaker clusters. It must be set to true - the default value - for the cluster deployment in question to receive support and consideration from Red Hat on any High-Availability-related concern, whether that concern be inherently related to fencing or not.

  • cman clusters: FENCE_JOIN=no is not a supported configuration in cman clusters. It must be set to yes - the default value - for the cluster deployment in question to receive support and consideration from Red Hat on any High-Availability-related concern, whether that concern be inherently related to fencing or not.


Every node must be managed by a fence device: For a cluster to receive support and consideration from Red Hat, every node in that cluster must have a configured fence device associated with it.

  • pacemaker clusters: pacemaker offers many ways for the cluster to dynamically or statically determine that a node can be managed by a particular stonith device in the configuration. Administrators should ensure that every node in the cluster should be manageable by some stonith device configured in that cluster.

  • cman clusters without pacemaker: For every node in the cluster, there must exist at least one device in that node's <fence/> stanza in /etc/cluster/cluster.conf


sbd watchdog-timeout fencing instead of a stonith device: sbd with watchdog-timeout fencing can be used in pacemaker clusters as an alternative to a fence-agent-based device - if configured according to all other relevant support policies applicable to sbd. All nodes must run sbd or else have an associated stonith device.


Clusters with shared block storage or DLM require power or storage-based devices: If the nodes of a cluster share access to block storage devices in any way - even if only in an active/passive manner - or has any components that use DLM, then the cluster is subject to more stringent requirements around fencing. In such clusters, all nodes in that cluster must be managed by a device that controls either:

  • The power state of that node (sbd qualifies here), or
  • Access to all block storage devices available to that node that are shared with other nodes.

If a node in a cluster with shared storage or DLM is associated only with a device using an alternative agent that does not manage power or storage access - such as fence_kdump - then that cluster will not receive support or consideration from Red Hat.

If a node in a cluster with shared storage is associated only with a device using a storage-based agent that does not control access to all block storage devices shared by the cluster, then that cluster will not receive support or consideration from Red Hat.

If a node is associated only with a device using a power-based agent that does not authoritatively control that node's power state, then that cluster will not receive support or consideration from Red Hat. For instance, if a node has a power-based device but that server has a redundant or independent power source that can keep the server operational through the disabling of the cluster-managed device, then that device does not meet the requirements for support.


Clusters with no shared block storage or DLM may use alternative agents and manual fencing: Clusters that do not share block storage in any way and do not use DLM may use devices with alternative agents that do not control power or storage access - such as fence_kdump - as their only automatic means of fencing. Red Hat's support for such use cases is subject to the following conditions:

  • Events which trigger fencing will execute the configured agent, and if that operation fails, an administrator must intervene to manually fence the node by powering it off. After manual fencing by powering off, the administrator can acknowledge to the cluster that manual fencing has taken place using the appropriate command - [pacemaker clusters] | [cman clusters]
  • Red Hat does does not place a high priority on development of features or behaviors specific to the case where such a fence-agent is in use that does not manage access to shared resources. Cluster functionality is designed around configurations that employ proper power or storage-based fence mechanisms, and alternative mechanisms will not receive high priority in development.
  • Even without shared storage, some applications may behave incorrectly or present conflicts in some manner if manual fencing is acknowledged without the node in question having been properly powered off. Red Hat Support will not provide support or consideration for behaviors following manual-fence acknowledgement where it cannot be proven that the manually-fenced node was fully powered off before acknowledgement was provided.
  • Red Hat still recommends the usage of a power-based agent or sbd for optimal behavior in the cluster.

NOTE: Most Red Hat Openstack Platform (RH-OSP) deployments with highly available controllers fall into this category of clusters without shared storage. While RH-OSP deployments may utilize distributed storage throughout such a cluster, these mechanisms do not carry the same conditions and considerations as true shared-block-storage setups. Red Hat still recommends power-based fencing or sbd in such setups, but these clusters may be used with alternative agents and manual fencing if preferred.


Limited support for environments using fence agents not provided by Red Hat: In cluster deployments utilizing any fence agent that is not distributed or supported by Red Hat, Red Hat Support may not assist with investigations or engagements in which fencing activity is involved. If problematic behavior results from or follows usage of a third-party fence agent, Red Hat may require that the behavior be reproduced in a configuration using only Red Hat provided components in order for the investigation to proceed. Red Hat recommends using one of the power or storage fence-agents it provides, or sbd.


Limitations around acknowledgement of manual fencing: Acknowledgement of manual fencing - [pacemaker clusters] | [cman clusters] - is intended only for execution by an administrator after a node has been confirmed to be powered off completely. Any behavior or scenario resulting from any other usage of such acknowledgement will not be considered or supported by Red Hat.

Comments