Design Guidance for RHEL High Availability Clusters - VMware Virtual Machines as Cluster Members
Contents
- Overview
- Virtualization/host management
- Virtual machine configuration
- RHEL High Availability cluster configuration
Overview
Applicable Environments
- Red Hat Enterprise Linux (RHEL) 7, 8 with the High Availability Add-On
- Using a VMware platform for virtual-machine hosting
- NOTE: This guide may focus on features that are applicable to the latest releases of these products.
Recommended Prior Reading
Useful References and Guides
Introduction
This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on VMware virtual machines.
VIRTUALIZATION/HOST MANAGEMENT
Virtualization management recommendation: Use vSphere + vCenter Server
Reasons:
- Provides several features to increase reliability of host clusters and VM cluster - see additional feature recommendations below for specifics.
- Compared to manual management of host with vSphere Hypervisor, reacting to failure conditions may be simpler and quicker - quick recovery of VMs on a new host, detection of failed hosts.
- Red Hat focuses on vCenter Server configurations and features in RHEL High Availability development and testing.
Virtualization management alternative strategy: Manual administration of vSphere Hypervisor (ESXi)
Considerations:
fence_vmware_soap
STONITH method is only as reliable as each host - if a host becomes unavailable or unresponsive, other VMs would be unable to use this method to fence their missing peers, and cluster operations would block.- VM cluster can be configured to use hosts' power supplies as a secondary STONITH method - for ability to fence a missing peer by way of its host if the host becomes unresponsive.
- Without vCenter Server, manual recovery of failed VMs may be slower, leaving VM cluster in a degraded membership longer. Design VM cluster with enough capacity to operate with enough members after a host failure.
Virtualization management alternative strategy: vRealize Suite
Considerations:
- Red Hat has not assessed the suitability of vRealize Suite for managing RHEL High Availability clusters of VMs, as an alternative to vCenter Server or vSphere Hypervisor manual administration.
- RHEL High Availability does not offer any special features or STONITH methods aimed at vRealize Suite - the environment must be compatible with typical VMware-targeted capabilities of RHEL HA - like
fence_vmware_soap
orfence_scsi
for STONITH.
Recommendations with vCenter Server: Highly available vCenter Server
Additional details:
- Provide high availability for vCenter Server. See: kb.vmware.com - Supported vCenter Server High Availability Options.
- For maximum reliability, provide multi-site highly available vCenter Server and VMs in multi-site RHEL High Availability cluster. See: multi-site considerations in the RHEL High Availability Configuration section below.
Reasons:
- Highly available vCenter server provides better reliability of
fence_vmware_soap
STONITH method - Highly available vCenter maintains administrator access to host and VM management following failures. During these times this access may be critical for recovering or administering RHEL High Availability machines affected by that failure.
Recommendations with vCenter Server: vSphere HA
Additional details:
Enable host monitoring
Reasons:
- vSphere HA with host monitoring provides increased reliability of
fence_vmware_soap
STONITH method - DRS and vSphere HA with host monitoring provides quick recovery of VMs, to prevent VM-cluster from remaining in a degraded state after failure.
Recommendations with vCenter Server: Create vSphere DRS cluster
Additional details:
- DRS
Automation Level
: UseManual
orPartially Automated
- do not useFully Automated
.
Reasons:
- Providing multiple hosts for VM to run on + DRS VM-host affinity allows quick recovery of VM to a new location after host failure, instead of leaving VM cluster in a degraded membership.
Virtualization-host considerations: Number of hosts
Recommendation: Provide a separate host for each clustered-VM, and make available one more for failover/maintenance-activity
- A single host failure should only have minimal impact on the VM cluster
- Extra failover hosts allow for quick recovery of VMs after a host failure, while still not having to double-up VMs on a host - even if host outage is prolonged.
Alternative configuration: Provide enough hosts so no host has 50% or more of clustered VMs
- If 50% of VMs share a host, then a single host failure can take out a majority of the cluster
Alternative configuration: Only provide two hosts or allow 50%+ of VMs to share a host
- Only rely on this configuration for less critical workloads, where manual intervention to recover/resume cluster is acceptable.
- Use a
qdevice
in the RHEL HA cluster configuration withlms
algorithm for cluster to survive with 50% or less active membership - see cluster design sections below for details.
Not recommended: Single host designs - Use only for test/development purposes.
Virtualization-host recommendations: Host infrastructure redundancy
Additional details:
- Design hosts to be supported by separate infrastructures - power supplies, server rack-switches, etc.
- For extra reliability, spread hosts across different facilities or sites
- With hosts spread across sites, review RHEL HA cluster multi-site designs and considerations in sections below
Reasons:
- Hosts sharing power or network hardware results in single point of failure that could disrupt all RHEL HA members, causing an outage of managed applications.
Host environment design recommendations: Network redundancy
Additional details:
- For each of cluster member interconnect network, and application network, and network used for accessing vCenter Server / vSphere Hypervisors (with
fence_vmware_soap
): provide redundant uplink ports to vSwitches. - For cluster member interconnect network, alternatively a secondary network could be provided and used in conjunction with RHEL HA's RRP feature - redundant ring protocol.
Reasons:
- Redundancy in the cluster's membership interconnect links is important for avoiding membership disruptions or splits - which can trigger application outages and recovery.
- When using
fence_vmware_soap
STONITH method, access to vCenter Server or vSphere Hypervisors must be maintained so that recovery is not blocked if a member becomes unavailable. - Loss of application/client access network can keep the cluster's services from being reached by its users.
Virtualization-host configuration considerations: Cluster transport protocol
Considerations:
- Different RHEL HA transport protocols have different network needs. Choose which one is best for your cluster and consider host-network configuration for it. See: Exploring features - Transport protocols and Design guidance - Selecting a transport protocol.
- If using
udp
transport - requiring multicast capabilities on the network - consider VMware Multicast Filtering Modes. See: docs.vmware.com - Multicast Filtering Modes- Basic Multicast: If the physical network has proper multicast forwarding capabilities, this should work for a High Availability cluster.
- Multicast Snooping: Physical switches in-between ESXi hosts should have IGMP snooping enabled. Make sure vSphere's query time-interval is set less than any routing-table timeout on physical switches.
VIRTUAL MACHINE CONFIGURATION
VM distribution recommendation: Distribute VMs across as many hosts as are available
Additional details:
- With vCenter Server: Use DRS VM-host affinity to distribute VMs across hosts, and offer secondary priority hosts that still try to keep the VMs separate after failover.
- With other management strategies: Distribute VMs across hosts, and have a plan to be notified if a host or VM failure occurs, and a plan for how/where to recover VMs
Reasons:
- Running VMs on the same host puts the cluster at risk of losing several members from a single host failure.
- Distributing VMs across hosts insulates the cluster from a single failure causing a large disruption.
VM administration requirement: Prevent live migration while VM is active in cluster
Additional details:
- See: Support policies - General conditions with virtualized cluster members
- If using DRS VM-host affinity (as in above recommendation), live migration is prevented.
- If manually administering VM distribution across hosts, or if not using DRS VM-host affinity - ensure policies and practices prevent any live migration while a VM is active. Always stop the RHEL HA cluster services (
pcs cluster stop
) before migrating any VM.
Reasons:
- Live migration introduces a pause in VM processing, which can disrupt High Availability membership (and is often observed to, in real production environments). This pause is unpredictable and configuring HA to definitively avoid it is difficult.
- Live migration by vMotion takes special measures to update multicast group registration of VMs at their new host. This has not been assess by Red Hat to determine if RHEL High Availability's
udp
(multicast) transport protocol is compatible with these measures. - Live migration may cause any static STONITH configuration host VM mappings in the cluster to become incorrect. Even if there is a plan to update STONITH configuration before or after migration, there would be a period either at the front-end or back-end of the migration where STONITH settings would be incorrect and STONITH could fail if the VM became unresponsive - potentially blocking cluster operations.
VM resource configuration considerations: storage devices
Considerations:
- Consider Red Hat's policy and suggestions related to VMware storage access methods: Support policies - VMware virtual machines as cluster members
- If using
fence_scsi
orfence_mpath
STONITH method, see special conditions: Support policies -fence_scsi
andfence_mpath
- Unless using
fence_scsi
orfence_mpath
- Red Hat does not prefer or recommend any particular storage configuration for RHEL High Availability. Consult with VMware for guidance.
VM system resource configuration considerations
See: Design guidance - Membership layout and member system specifications
RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION
General membership layout considerations
See: Design guidance - Membership layout and member system specifications
STONITH recommendations: fence_vmware_soap
or fence_vmware_rest
as primary method
Additional details:
- Configure to point at vCenter Server if using, or one-STONITH-device per vSphere Hypervisor otherwise
- Be sure to configure STONITH-device attribute
pcmk_host_map="<nodename>:<VM name>;<nodename>:<VM name>[...]"
if VM names in VMware do not match cluster-node names in RHEL HA configuration. See: Red Hat knowledge solution - STONITH configuration when port values don't match node names
Reasons:
- Power fencing is more reliable than storage-fencing (like
fence_scsi
) at returning a cluster to its full membership without manual administrator intervention. fence_vmware_soap
in conjunction with a highly-available vCenter Server can allow fencing to succeed even if the primary vCenter Server fails.
STONITH recommendations: fence_scsi
as secondary method
Additional details:
- For instruction, see:
- Multipathing is most often done at host layers and abstracted from VMs.
fence_scsi
is appropriate there. Usefence_mpath
only if VMs are usingdevice-mapper-multipath
within VMs with direct-access multi-path LUs. - If using
fence_scsi
orfence_mpath
, configure storage to adhere to Red Hat requirements forfence_scsi
/fence_mpath
on VMware platforms. See: Support Policies for RHEL High Availability Clusters - fence_scsi and fence_mpath - Red Hat Customer Portal
Reasons:
- Storage fencing leaves a failed member in a powered-on state, requiring either a manual reboot or using a special watchdog script. See: Red Hat Knowledge Solution -
fence_scsi
watchdog - May only be compatible with certain storage configurations and products. See: Support policies -
fence_scsi
andfence_mpath
STONITH considerations: Virtualization-host power supply as secondary method
Additional details:
- Configure the VM cluster to use a fallback STONITH method that controls a host's power source -
fence_ipmilan
,fence_apc_snmp
,fence_cisco_ucs
,fence_ilo4
...etc. - IMPORTANT: there must be a permanent VM-host affinity or consistent manual distribution of VM to host - so STONITH can be configured to correctly power off the right host for a given VM. Failure to keep the VM distribution aligned with STONITH configuration could result in data corruption or other conflicts.
- Be aware this gives VM cluster ability to power off a host, which might have other VMs running on it.
- Usually only recommended for deployments without vCenter Server -
fence_vmware_soap
is less reliable there, so it is useful to have a backup method.
Quorum recommendations: Use qdevice
Additional details:
- Use of
qdevice
is strongly recommended - especially if there may be 50% or more of cluster members running on a single host at any time - otherwise a host failure may block cluster operations. - Use
qdevice
algorithmlms
to allow less than half of members to continue operations. - Host
corosync-qnetd
server in a neutral external host cluster, separate facility, or a baremetal machine - so that it is less at risk of being disrupted by any failure scenario that may affect VM cluster-members.corosync-qnetd
needs to survive so it can help decide membership quorum, and if its hosted with cluster members, it may be disrupted along with them. - For more information on
qdevice
, see:- Administrative procedures - Deploying a RHEL 7
qnetd
server - Administrative procedures - Enabling
qdevice
quorum arbitration in RHEL 7 - Support policies -
corosync-qdevice
andcorosync-qnetd
- Explore components -
corosync-qdevice
andcorosync-qnetd
- Design guidance - Considerations with
qdevice
quorum arbitration
- Administrative procedures - Deploying a RHEL 7
Reasons:
- Increased reliability of the cluster in scenarios with member failures or link disruptions
qdevice
allows cluster to survive after more member failures than a typical "majority wins" configuration- In membership splits,
qdevice
gives authority to members that still have external connectivity and thus may be more able to serve clients or external users.
Multi-site design considerations
Considerations:
-
If targeting a single membership spanning multiple sites - be aware that reliable cross-site cluster STONITH is a challenge in VMware environments.
fence_vmware_soap
andfence_vmware_rest
is dependent on network-connectivity to vCenter Server or vSphere Hypervisors, so inter-site link-disruptions can block fencing of VM-cluster-members spread across sites.fence_scsi
/fence_mpath
may be limited by the lack of SCSI-PR capabilities of storage replication products that target multi-site deployments. Consult storage vendor guidance for whether SCSI persistent reservations would be handled sanely with their products.- Red Hat recommends trying to modify the use case to work in an active-passive system, using coordinate multi-site failover clusters with
booth
ticket manager. - If using a single membership spanning multiple sites, plan for possible manual intervention by administrators to recover cluster if
fence_vmware_soap
cannot reach its target. - A highly available vCenter Server configuration may help with STONITH reliability.
qdevice
is strongly recommended to allow a cluster to intelligently resolve a split membership.
-
Using a Coordinating multi-site failover clusters with
booth
ticket manager design can achieve multi-site failover without needing cross-site fencing. These may be more appropriate for VMware multi-site deployments.booth
configurations target active-passive applications.- Active-active applications that need to be active in multiple sites simultaneously would require a single membership spanning sites, requiring a reliable fence method to work across sites.
qdevice
is also useful within these coordinating clusters, for greater reliability without having to failover to another site.
Comments