Chapter 1. Red Hat OpenStack Platform high availability overview and planning

Red Hat OpenStack Platform (RHOSP) high availability (HA) is a collection of services that orchestrate failover and recovery for your deployment. When you plan your HA deployment, ensure that you review the considerations for different aspects of the environment, such as hardware assignments and network configuration.

1.1. Red Hat OpenStack Platform high availability services

Red Hat OpenStack Platform (RHOSP) employs several technologies to provide the services required to implement high availability (HA). These services include Galera, RabbitMQ, Redis, HAProxy, individual services that Pacemaker manages, and Systemd and plain container services that Podman manages.

1.1.1. Service types

Core container

Core container services are Galera, RabbitMQ, Redis, and HAProxy. These services run on all Controller nodes and require specific management and constraints for the start, stop and restart actions. You use Pacemaker to launch, manage, and troubleshoot core container services.

Note

RHOSP uses the MariaDB Galera Cluster to manage database replication.

Active-passive
Active-passive services run on one Controller node at a time, and include services such as openstack-cinder-volume. To move an active-passive service, you must use Pacemaker to ensure that the correct stop-start sequence is followed.
Systemd and plain container

Systemd and plain container services are independent services that can withstand a service interruption. Therefore, if you restart a high availability service such as Galera, you do not need to manually restart any other service, such as nova-api. You can use systemd or Podman to directly manage systemd and plain container services.

When orchestrating your HA deployment, director uses templates and Puppet modules to ensure that all services are configured and launched correctly. In addition, when troubleshooting HA issues, you must interact with services in the HA framework using the podman command or the systemctl command.

1.1.2. Service modes

HA services can run in one of the following modes:

Active-active

Pacemaker runs the same service on multiple Controller nodes, and uses HAProxy to distribute traffic across the nodes or to a specific Controller with a single IP address. In some cases, HAProxy distributes traffic to active-active services with Round Robin scheduling. You can add more Controller nodes to improve performance.

Important

Active-active mode is supported only in distributed compute node (DCN) architecture at Edge sites.

Active-passive
Services that are unable to run in active-active mode must run in active-passive mode. In this mode, only one instance of the service is active at a time. For example, HAProxy uses stick-table options to direct incoming Galera database connection requests to a single back-end service. This helps prevent too many simultaneous connections to the same data from multiple Galera nodes.

1.2. Planning high availability hardware assignments

When you plan hardware assignments, consider the number of nodes that you want to run in your deployment, as well as the number of Virtual Machine (vm) instances that you plan to run on Compute nodes.

Controller nodes
Most non-storage services run on Controller nodes. All services are replicated across the three nodes and are configured as active-active or active-passive services. A high availability (HA) environment requires a minimum of three nodes.
Red Hat Ceph Storage nodes
Storage services run on these nodes and provide pools of Red Hat Ceph Storage areas to the Compute nodes. A minimum of three nodes are required.
Compute nodes
Virtual machine (VM) instances run on Compute nodes. You can deploy as many Compute nodes as you need to meet your capacity requirements, as well as migration and reboot operations. You must connect Compute nodes to the storage network and to the project network to ensure that VMs can access storage nodes, VMs on other Compute nodes, and public networks.
STONITH
You must configure a STONITH device for each node that is a part of the Pacemaker cluster in a highly available overcloud. Deploying a highly available overcloud without STONITH is not supported. For more information on STONITH and Pacemaker, see Fencing in a Red Hat High Availability Cluster and Support Policies for RHEL High Availability Clusters.

1.3. Planning high availability networking

When you plan the virtual and physical networks, consider the provisioning network switch configuration and the external network switch configuration.

In addition to the network configuration, you must deploy the following components:

Provisioning network switch
  • This switch must be able to connect the undercloud to all the physical computers in the overcloud.
  • The NIC on each overcloud node that is connected to this switch must be able to PXE boot from the undercloud.
  • The portfast parameter must be enabled.
Controller/External network switch
  • This switch must be configured to perform VLAN tagging for the other VLANs in the deployment.
  • Allow only VLAN 100 traffic to external networks.
Networking hardware and keystone endpoint
  • To prevent a Controller node network card or network switch failure disrupting overcloud services availability, ensure that the keystone admin endpoint is located on a network that uses bonded network cards or networking hardware redundancy.

    If you move the keystone endpoint to a different network, such as internal_api, ensure that the undercloud can reach the VLAN or subnet. For more information, see the Red Hat Knowledgebase solution How to migrate Keystone Admin Endpoint to internal_api network.

1.4. Accessing the high availability environment

To investigate high availability (HA) nodes, use the stack user to log in to the overcloud nodes and run the openstack server list command to view the status and details of the nodes.

Prerequisites

  • High availability is deployed and running.

Procedure

  1. In a running HA environment, log in to the undercloud as the stack user.
  2. Identify the IP addresses of your overcloud nodes:

    $ source ~/stackrc
    (undercloud) $ openstack server list
     +-------+------------------------+---+----------------------+---+
     | ID    | Name                   |...| Networks             |...|
     +-------+------------------------+---+----------------------+---+
     | d1... | overcloud-controller-0 |...| ctlplane=*10.200.0.11* |...|
     ...
  3. Log in to one of the overcloud nodes:

    (undercloud) $ ssh heat-admin@<node_IP>

    Replace <node_ip> with the IP address of the node that you want to log in to.

1.5. Additional resources