Design Guidance for RHEL High Availability Clusters - VMware Virtual Machines as Cluster Members

Overview
Virtualization/host management
Virtual machine configuration
RHEL High Availability cluster configuration

Overview

Applicable Environments

Red Hat Enterprise Linux (RHEL) 7, 8 with the High Availability Add-On
Using a VMware platform for virtual-machine hosting
NOTE: This guide may focus on features that are applicable to the latest releases of these products.

Useful References and Guides

Design guides for RHEL High Availability clusters

Introduction

This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on VMware virtual machines.

VIRTUALIZATION/HOST MANAGEMENT

Virtualization management recommendation: Use vSphere + vCenter Server

Reasons:

Provides several features to increase reliability of host clusters and VM cluster - see additional feature recommendations below for specifics.
Compared to manual management of host with vSphere Hypervisor, reacting to failure conditions may be simpler and quicker - quick recovery of VMs on a new host, detection of failed hosts.
Red Hat focuses on vCenter Server configurations and features in RHEL High Availability development and testing.

Virtualization management alternative strategy: Manual administration of vSphere Hypervisor (ESXi)

Considerations:

fence_vmware_soap STONITH method is only as reliable as each host - if a host becomes unavailable or unresponsive, other VMs would be unable to use this method to fence their missing peers, and cluster operations would block.
VM cluster can be configured to use hosts' power supplies as a secondary STONITH method - for ability to fence a missing peer by way of its host if the host becomes unresponsive.
Without vCenter Server, manual recovery of failed VMs may be slower, leaving VM cluster in a degraded membership longer. Design VM cluster with enough capacity to operate with enough members after a host failure.

Virtualization management alternative strategy: vRealize Suite

Considerations:

Red Hat has not assessed the suitability of vRealize Suite for managing RHEL High Availability clusters of VMs, as an alternative to vCenter Server or vSphere Hypervisor manual administration.
RHEL High Availability does not offer any special features or STONITH methods aimed at vRealize Suite - the environment must be compatible with typical VMware-targeted capabilities of RHEL HA - like fence_vmware_soap or fence_scsi for STONITH.

Recommendations with vCenter Server: Highly available vCenter Server

Additional details:

Provide high availability for vCenter Server. See: kb.vmware.com - Supported vCenter Server High Availability Options.
For maximum reliability, provide multi-site highly available vCenter Server and VMs in multi-site RHEL High Availability cluster. See: multi-site considerations in the RHEL High Availability Configuration section below.

Reasons:

Highly available vCenter server provides better reliability of fence_vmware_soap STONITH method
Highly available vCenter maintains administrator access to host and VM management following failures. During these times this access may be critical for recovering or administering RHEL High Availability machines affected by that failure.

Recommendations with vCenter Server: vSphere HA

Additional details:

Enable host monitoring

Reasons:

vSphere HA with host monitoring provides increased reliability of fence_vmware_soap STONITH method
DRS and vSphere HA with host monitoring provides quick recovery of VMs, to prevent VM-cluster from remaining in a degraded state after failure.

Recommendations with vCenter Server: Create vSphere DRS cluster

Additional details:

DRS Automation Level: Use Manual or Partially Automated - do not use Fully Automated.

Reasons:

Providing multiple hosts for VM to run on + DRS VM-host affinity allows quick recovery of VM to a new location after host failure, instead of leaving VM cluster in a degraded membership.

Virtualization-host considerations: Number of hosts

Recommendation: Provide a separate host for each clustered-VM, and make available one more for failover/maintenance-activity

A single host failure should only have minimal impact on the VM cluster
Extra failover hosts allow for quick recovery of VMs after a host failure, while still not having to double-up VMs on a host - even if host outage is prolonged.

Alternative configuration: Provide enough hosts so no host has 50% or more of clustered VMs

If 50% of VMs share a host, then a single host failure can take out a majority of the cluster

Alternative configuration: Only provide two hosts or allow 50%+ of VMs to share a host

Only rely on this configuration for less critical workloads, where manual intervention to recover/resume cluster is acceptable.
Use a qdevice in the RHEL HA cluster configuration with lms algorithm for cluster to survive with 50% or less active membership - see cluster design sections below for details.

Not recommended: Single host designs - Use only for test/development purposes.

Virtualization-host recommendations: Host infrastructure redundancy

Additional details:

Design hosts to be supported by separate infrastructures - power supplies, server rack-switches, etc.
For extra reliability, spread hosts across different facilities or sites
- With hosts spread across sites, review RHEL HA cluster multi-site designs and considerations in sections below

Reasons:

Hosts sharing power or network hardware results in single point of failure that could disrupt all RHEL HA members, causing an outage of managed applications.

Host environment design recommendations: Network redundancy

Additional details:

For each of cluster member interconnect network, and application network, and network used for accessing vCenter Server / vSphere Hypervisors (with fence_vmware_soap): provide redundant uplink ports to vSwitches.
For cluster member interconnect network, alternatively a secondary network could be provided and used in conjunction with RHEL HA's RRP feature - redundant ring protocol.

Reasons:

Redundancy in the cluster's membership interconnect links is important for avoiding membership disruptions or splits - which can trigger application outages and recovery.
When using fence_vmware_soap STONITH method, access to vCenter Server or vSphere Hypervisors must be maintained so that recovery is not blocked if a member becomes unavailable.
Loss of application/client access network can keep the cluster's services from being reached by its users.

Virtualization-host configuration considerations: Cluster transport protocol

Considerations:

Different RHEL HA transport protocols have different network needs. Choose which one is best for your cluster and consider host-network configuration for it. See: Exploring features - Transport protocols and Design guidance - Selecting a transport protocol.
If using udp transport - requiring multicast capabilities on the network - consider VMware Multicast Filtering Modes. See: docs.vmware.com - Multicast Filtering Modes
- Basic Multicast: If the physical network has proper multicast forwarding capabilities, this should work for a High Availability cluster.
- Multicast Snooping: Physical switches in-between ESXi hosts should have IGMP snooping enabled. Make sure vSphere's query time-interval is set less than any routing-table timeout on physical switches.

VIRTUAL MACHINE CONFIGURATION

VM distribution recommendation: Distribute VMs across as many hosts as are available

Additional details:

With vCenter Server: Use DRS VM-host affinity to distribute VMs across hosts, and offer secondary priority hosts that still try to keep the VMs separate after failover.
With other management strategies: Distribute VMs across hosts, and have a plan to be notified if a host or VM failure occurs, and a plan for how/where to recover VMs

Reasons:

Running VMs on the same host puts the cluster at risk of losing several members from a single host failure.
Distributing VMs across hosts insulates the cluster from a single failure causing a large disruption.

VM administration requirement: Prevent live migration while VM is active in cluster

Additional details:

See: Support policies - General conditions with virtualized cluster members
If using DRS VM-host affinity (as in above recommendation), live migration is prevented.
If manually administering VM distribution across hosts, or if not using DRS VM-host affinity - ensure policies and practices prevent any live migration while a VM is active. Always stop the RHEL HA cluster services (pcs cluster stop) before migrating any VM.

Reasons:

Live migration introduces a pause in VM processing, which can disrupt High Availability membership (and is often observed to, in real production environments). This pause is unpredictable and configuring HA to definitively avoid it is difficult.
Live migration by vMotion takes special measures to update multicast group registration of VMs at their new host. This has not been assess by Red Hat to determine if RHEL High Availability's udp (multicast) transport protocol is compatible with these measures.
Live migration may cause any static STONITH configuration host VM mappings in the cluster to become incorrect. Even if there is a plan to update STONITH configuration before or after migration, there would be a period either at the front-end or back-end of the migration where STONITH settings would be incorrect and STONITH could fail if the VM became unresponsive - potentially blocking cluster operations.

VM resource configuration considerations: storage devices

Considerations:

Consider Red Hat's policy and suggestions related to VMware storage access methods: Support policies - VMware virtual machines as cluster members
If using fence_scsi or fence_mpath STONITH method, see special conditions: Support policies - fence_scsi and fence_mpath
Unless using fence_scsi or fence_mpath - Red Hat does not prefer or recommend any particular storage configuration for RHEL High Availability. Consult with VMware for guidance.

VM system resource configuration considerations

See: Design guidance - Membership layout and member system specifications

RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION

General membership layout considerations

See: Design guidance - Membership layout and member system specifications

STONITH recommendations: `fence_vmware_soap` or `fence_vmware_rest` as primary method

Additional details:

Configure to point at vCenter Server if using, or one-STONITH-device per vSphere Hypervisor otherwise
Be sure to configure STONITH-device attribute pcmk_host_map="<nodename>:<VM name>;<nodename>:<VM name>[...]" if VM names in VMware do not match cluster-node names in RHEL HA configuration. See: Red Hat knowledge solution - STONITH configuration when port values don't match node names

Reasons:

Power fencing is more reliable than storage-fencing (like fence_scsi) at returning a cluster to its full membership without manual administrator intervention.
fence_vmware_soap in conjunction with a highly-available vCenter Server can allow fencing to succeed even if the primary vCenter Server fails.

STONITH recommendations: `fence_scsi` as secondary method

Additional details:

For instruction, see:
- Administrative procedures - Configuring VMware VMs to be Compatible with the fence_scsi or fence_mpath STONITH Method
- Administrative procedures - Configuring fence_scsi with pacemaker
Multipathing is most often done at host layers and abstracted from VMs. fence_scsi is appropriate there. Use fence_mpath only if VMs are using device-mapper-multipath within VMs with direct-access multi-path LUs.
If using fence_scsi or fence_mpath, configure storage to adhere to Red Hat requirements for fence_scsi / fence_mpath on VMware platforms. See: Support Policies for RHEL High Availability Clusters - fence_scsi and fence_mpath - Red Hat Customer Portal

Reasons:

Storage fencing leaves a failed member in a powered-on state, requiring either a manual reboot or using a special watchdog script. See: Red Hat Knowledge Solution - fence_scsi watchdog
May only be compatible with certain storage configurations and products. See: Support policies - fence_scsi and fence_mpath

STONITH considerations: Virtualization-host power supply as secondary method

Additional details:

Configure the VM cluster to use a fallback STONITH method that controls a host's power source - fence_ipmilan, fence_apc_snmp, fence_cisco_ucs, fence_ilo4...etc.
IMPORTANT: there must be a permanent VM-host affinity or consistent manual distribution of VM to host - so STONITH can be configured to correctly power off the right host for a given VM. Failure to keep the VM distribution aligned with STONITH configuration could result in data corruption or other conflicts.
Be aware this gives VM cluster ability to power off a host, which might have other VMs running on it.
Usually only recommended for deployments without vCenter Server - fence_vmware_soap is less reliable there, so it is useful to have a backup method.

Quorum recommendations: Use `qdevice`

Additional details:

Use of qdevice is strongly recommended - especially if there may be 50% or more of cluster members running on a single host at any time - otherwise a host failure may block cluster operations.
Use qdevice algorithm lms to allow less than half of members to continue operations.
Host corosync-qnetd server in a neutral external host cluster, separate facility, or a baremetal machine - so that it is less at risk of being disrupted by any failure scenario that may affect VM cluster-members. corosync-qnetd needs to survive so it can help decide membership quorum, and if its hosted with cluster members, it may be disrupted along with them.
For more information on qdevice, see:

Reasons:

Increased reliability of the cluster in scenarios with member failures or link disruptions
qdevice allows cluster to survive after more member failures than a typical "majority wins" configuration
In membership splits, qdevice gives authority to members that still have external connectivity and thus may be more able to serve clients or external users.

Multi-site design considerations

Considerations:

If targeting a single membership spanning multiple sites - be aware that reliable cross-site cluster STONITH is a challenge in VMware environments.
- fence_vmware_soap and fence_vmware_rest is dependent on network-connectivity to vCenter Server or vSphere Hypervisors, so inter-site link-disruptions can block fencing of VM-cluster-members spread across sites.
- fence_scsi / fence_mpath may be limited by the lack of SCSI-PR capabilities of storage replication products that target multi-site deployments. Consult storage vendor guidance for whether SCSI persistent reservations would be handled sanely with their products.
- Red Hat recommends trying to modify the use case to work in an active-passive system, using coordinate multi-site failover clusters with booth ticket manager.
- If using a single membership spanning multiple sites, plan for possible manual intervention by administrators to recover cluster if fence_vmware_soap cannot reach its target.
- A highly available vCenter Server configuration may help with STONITH reliability.
- qdevice is strongly recommended to allow a cluster to intelligently resolve a split membership.
Using a Coordinating multi-site failover clusters with booth ticket manager design can achieve multi-site failover without needing cross-site fencing. These may be more appropriate for VMware multi-site deployments.
- booth configurations target active-passive applications.
- Active-active applications that need to be active in multiple sites simultaneously would require a single membership spanning sites, requiring a reliable fence method to work across sites.
- qdevice is also useful within these coordinating clusters, for greater reliability without having to failover to another site.

Contents

Overview

Applicable Environments

Recommended Prior Reading

Useful References and Guides

Introduction

VIRTUALIZATION/HOST MANAGEMENT

Virtualization management recommendation: Use vSphere + vCenter Server

Virtualization management alternative strategy: Manual administration of vSphere Hypervisor (ESXi)

Virtualization management alternative strategy: vRealize Suite

Recommendations with vCenter Server: Highly available vCenter Server

Recommendations with vCenter Server: vSphere HA

Recommendations with vCenter Server: Create vSphere DRS cluster

Virtualization-host considerations: Number of hosts

Virtualization-host recommendations: Host infrastructure redundancy

Host environment design recommendations: Network redundancy

Virtualization-host configuration considerations: Cluster transport protocol

VIRTUAL MACHINE CONFIGURATION

VM distribution recommendation: Distribute VMs across as many hosts as are available

VM administration requirement: Prevent live migration while VM is active in cluster

VM resource configuration considerations: storage devices

VM system resource configuration considerations

RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION

General membership layout considerations

STONITH recommendations: fence_vmware_soap or fence_vmware_rest as primary method

STONITH recommendations: fence_scsi as secondary method

STONITH considerations: Virtualization-host power supply as secondary method

Quorum recommendations: Use qdevice

Multi-site design considerations

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links

STONITH recommendations: `fence_vmware_soap` or `fence_vmware_rest` as primary method

STONITH recommendations: `fence_scsi` as secondary method

Quorum recommendations: Use `qdevice`