Design Guidance for RHEL High Availability Clusters - VMware Virtual Machines as Cluster Members

Updated -

Contents

Overview

Applicable Environments

  • Red Hat Enterprise Linux (RHEL) 7, 8 with the High Availability Add-On
  • Using a VMware platform for virtual-machine hosting
  • NOTE: This guide may focus on features that are applicable to the latest releases of these products.

Useful References and Guides

Introduction

This guide introduces administrators to Red Hat's recommendations, references, and considerations that may be useful in designing a RHEL High Availability cluster running on VMware virtual machines.

VIRTUALIZATION/HOST MANAGEMENT

Virtualization management recommendation: Use vSphere + vCenter Server

Reasons:

  • Provides several features to increase reliability of host clusters and VM cluster - see additional feature recommendations below for specifics.
  • Compared to manual management of host with vSphere Hypervisor, reacting to failure conditions may be simpler and quicker - quick recovery of VMs on a new host, detection of failed hosts.
  • Red Hat focuses on vCenter Server configurations and features in RHEL High Availability development and testing.

Virtualization management alternative strategy: Manual administration of vSphere Hypervisor (ESXi)

Considerations:

  • fence_vmware_soap STONITH method is only as reliable as each host - if a host becomes unavailable or unresponsive, other VMs would be unable to use this method to fence their missing peers, and cluster operations would block.
  • VM cluster can be configured to use hosts' power supplies as a secondary STONITH method - for ability to fence a missing peer by way of its host if the host becomes unresponsive.
  • Without vCenter Server, manual recovery of failed VMs may be slower, leaving VM cluster in a degraded membership longer. Design VM cluster with enough capacity to operate with enough members after a host failure.

Virtualization management alternative strategy: vRealize Suite

Considerations:

  • Red Hat has not assessed the suitability of vRealize Suite for managing RHEL High Availability clusters of VMs, as an alternative to vCenter Server or vSphere Hypervisor manual administration.
  • RHEL High Availability does not offer any special features or STONITH methods aimed at vRealize Suite - the environment must be compatible with typical VMware-targeted capabilities of RHEL HA - like fence_vmware_soap or fence_scsi for STONITH.

Recommendations with vCenter Server: Highly available vCenter Server

Additional details:

  • Provide high availability for vCenter Server. See: kb.vmware.com - Supported vCenter Server High Availability Options.
  • For maximum reliability, provide multi-site highly available vCenter Server and VMs in multi-site RHEL High Availability cluster. See: multi-site considerations in the RHEL High Availability Configuration section below.

Reasons:

  • Highly available vCenter server provides better reliability of fence_vmware_soap STONITH method
  • Highly available vCenter maintains administrator access to host and VM management following failures. During these times this access may be critical for recovering or administering RHEL High Availability machines affected by that failure.

Recommendations with vCenter Server: vSphere HA

Additional details:

  • Enable host monitoring

Reasons:

  • vSphere HA with host monitoring provides increased reliability of fence_vmware_soap STONITH method
  • DRS and vSphere HA with host monitoring provides quick recovery of VMs, to prevent VM-cluster from remaining in a degraded state after failure.

Recommendations with vCenter Server: Create vSphere DRS cluster

Additional details:

  • DRS Automation Level: Use Manual or Partially Automated - do not use Fully Automated.

Reasons:

  • Providing multiple hosts for VM to run on + DRS VM-host affinity allows quick recovery of VM to a new location after host failure, instead of leaving VM cluster in a degraded membership.

Virtualization-host considerations: Number of hosts

Recommendation: Provide a separate host for each clustered-VM, and make available one more for failover/maintenance-activity

  • A single host failure should only have minimal impact on the VM cluster
  • Extra failover hosts allow for quick recovery of VMs after a host failure, while still not having to double-up VMs on a host - even if host outage is prolonged.

Alternative configuration: Provide enough hosts so no host has 50% or more of clustered VMs

  • If 50% of VMs share a host, then a single host failure can take out a majority of the cluster

Alternative configuration: Only provide two hosts or allow 50%+ of VMs to share a host

  • Only rely on this configuration for less critical workloads, where manual intervention to recover/resume cluster is acceptable.
  • Use a qdevice in the RHEL HA cluster configuration with lms algorithm for cluster to survive with 50% or less active membership - see cluster design sections below for details.

Not recommended: Single host designs - Use only for test/development purposes.


Virtualization-host recommendations: Host infrastructure redundancy

Additional details:

  • Design hosts to be supported by separate infrastructures - power supplies, server rack-switches, etc.
  • For extra reliability, spread hosts across different facilities or sites
    • With hosts spread across sites, review RHEL HA cluster multi-site designs and considerations in sections below

Reasons:

  • Hosts sharing power or network hardware results in single point of failure that could disrupt all RHEL HA members, causing an outage of managed applications.

Host environment design recommendations: Network redundancy

Additional details:

  • For each of cluster member interconnect network, and application network, and network used for accessing vCenter Server / vSphere Hypervisors (with fence_vmware_soap): provide redundant uplink ports to vSwitches.
  • For cluster member interconnect network, alternatively a secondary network could be provided and used in conjunction with RHEL HA's RRP feature - redundant ring protocol.

Reasons:

  • Redundancy in the cluster's membership interconnect links is important for avoiding membership disruptions or splits - which can trigger application outages and recovery.
  • When using fence_vmware_soap STONITH method, access to vCenter Server or vSphere Hypervisors must be maintained so that recovery is not blocked if a member becomes unavailable.
  • Loss of application/client access network can keep the cluster's services from being reached by its users.

Virtualization-host configuration considerations: Cluster transport protocol

Considerations:

  • Different RHEL HA transport protocols have different network needs. Choose which one is best for your cluster and consider host-network configuration for it. See: Exploring features - Transport protocols and Design guidance - Selecting a transport protocol.
  • If using udp transport - requiring multicast capabilities on the network - consider VMware Multicast Filtering Modes. See: docs.vmware.com - Multicast Filtering Modes
    • Basic Multicast: If the physical network has proper multicast forwarding capabilities, this should work for a High Availability cluster.
    • Multicast Snooping: Physical switches in-between ESXi hosts should have IGMP snooping enabled. Make sure vSphere's query time-interval is set less than any routing-table timeout on physical switches.

VIRTUAL MACHINE CONFIGURATION

VM distribution recommendation: Distribute VMs across as many hosts as are available

Additional details:

  • With vCenter Server: Use DRS VM-host affinity to distribute VMs across hosts, and offer secondary priority hosts that still try to keep the VMs separate after failover.
  • With other management strategies: Distribute VMs across hosts, and have a plan to be notified if a host or VM failure occurs, and a plan for how/where to recover VMs

Reasons:

  • Running VMs on the same host puts the cluster at risk of losing several members from a single host failure.
  • Distributing VMs across hosts insulates the cluster from a single failure causing a large disruption.

VM administration requirement: Prevent live migration while VM is active in cluster

Additional details:

  • See: Support policies - General conditions with virtualized cluster members
  • If using DRS VM-host affinity (as in above recommendation), live migration is prevented.
  • If manually administering VM distribution across hosts, or if not using DRS VM-host affinity - ensure policies and practices prevent any live migration while a VM is active. Always stop the RHEL HA cluster services (pcs cluster stop) before migrating any VM.

Reasons:

  • Live migration introduces a pause in VM processing, which can disrupt High Availability membership (and is often observed to, in real production environments). This pause is unpredictable and configuring HA to definitively avoid it is difficult.
  • Live migration by vMotion takes special measures to update multicast group registration of VMs at their new host. This has not been assess by Red Hat to determine if RHEL High Availability's udp (multicast) transport protocol is compatible with these measures.
  • Live migration may cause any static STONITH configuration host VM mappings in the cluster to become incorrect. Even if there is a plan to update STONITH configuration before or after migration, there would be a period either at the front-end or back-end of the migration where STONITH settings would be incorrect and STONITH could fail if the VM became unresponsive - potentially blocking cluster operations.

VM resource configuration considerations: storage devices

Considerations:


VM system resource configuration considerations

See: Design guidance - Membership layout and member system specifications


RHEL HIGH AVAILABILITY CLUSTER CONFIGURATION

General membership layout considerations

See: Design guidance - Membership layout and member system specifications


STONITH recommendations: fence_vmware_soap or fence_vmware_rest as primary method

Additional details:

Reasons:

  • Power fencing is more reliable than storage-fencing (like fence_scsi) at returning a cluster to its full membership without manual administrator intervention.
  • fence_vmware_soap in conjunction with a highly-available vCenter Server can allow fencing to succeed even if the primary vCenter Server fails.

STONITH recommendations: fence_scsi as secondary method

Additional details:

Reasons:


STONITH considerations: Virtualization-host power supply as secondary method

Additional details:

  • Configure the VM cluster to use a fallback STONITH method that controls a host's power source - fence_ipmilan, fence_apc_snmp, fence_cisco_ucs, fence_ilo4...etc.
  • IMPORTANT: there must be a permanent VM-host affinity or consistent manual distribution of VM to host - so STONITH can be configured to correctly power off the right host for a given VM. Failure to keep the VM distribution aligned with STONITH configuration could result in data corruption or other conflicts.
  • Be aware this gives VM cluster ability to power off a host, which might have other VMs running on it.
  • Usually only recommended for deployments without vCenter Server - fence_vmware_soap is less reliable there, so it is useful to have a backup method.

Quorum recommendations: Use qdevice

Additional details:

Reasons:

  • Increased reliability of the cluster in scenarios with member failures or link disruptions
  • qdevice allows cluster to survive after more member failures than a typical "majority wins" configuration
  • In membership splits, qdevice gives authority to members that still have external connectivity and thus may be more able to serve clients or external users.

Multi-site design considerations

Considerations:

  • If targeting a single membership spanning multiple sites - be aware that reliable cross-site cluster STONITH is a challenge in VMware environments.

    • fence_vmware_soap and fence_vmware_rest is dependent on network-connectivity to vCenter Server or vSphere Hypervisors, so inter-site link-disruptions can block fencing of VM-cluster-members spread across sites.
    • fence_scsi / fence_mpath may be limited by the lack of SCSI-PR capabilities of storage replication products that target multi-site deployments. Consult storage vendor guidance for whether SCSI persistent reservations would be handled sanely with their products.
    • Red Hat recommends trying to modify the use case to work in an active-passive system, using coordinate multi-site failover clusters with booth ticket manager.
    • If using a single membership spanning multiple sites, plan for possible manual intervention by administrators to recover cluster if fence_vmware_soap cannot reach its target.
    • A highly available vCenter Server configuration may help with STONITH reliability.
    • qdevice is strongly recommended to allow a cluster to intelligently resolve a split membership.
  • Using a Coordinating multi-site failover clusters with booth ticket manager design can achieve multi-site failover without needing cross-site fencing. These may be more appropriate for VMware multi-site deployments.

    • booth configurations target active-passive applications.
    • Active-active applications that need to be active in multiple sites simultaneously would require a single membership spanning sites, requiring a reliable fence method to work across sites.
    • qdevice is also useful within these coordinating clusters, for greater reliability without having to failover to another site.

Comments