Chapter 4. Using Pacemaker

In the OpenStack configuration described in Figure 1.1, “OpenStack HA environment deployed through director”, most OpenStack services are running on the three controller nodes. To investigate high availability features of those services, log into any of the controllers as the heat-admin user and look at services controlled by Pacemaker.

Output from the Pacemaker pcs status command includes general Pacemaker information, virtual IP addresses, services, and other Pacemaker information.

For general information about Pacemaker in Red Hat Enterprise Linux, see Configuring and Managing High Availability Clusters in the Red Hat Enterprise Linux documentation.

4.1. General Pacemaker Information

The following example shows the general Pacemaker information section of the of the pcs status command output:

$ sudo pcs status
    Cluster name: tripleo_cluster 1
    Stack: corosync
    Current DC: overcloud-controller-1 (version 2.0.1-4.el8-0eb7991564) - partition with quorum

    Last updated: Thu Feb  8 14:29:21 2018
    Last change: Sat Feb  3 11:37:17 2018 by root via cibadmin on overcloud-controller-2

    12 nodes configured 2
    37 resources configured 3

    Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] 4
    GuestOnline: [ galera-bundle-0@overcloud-controller-0 galera-bundle-1@overcloud-controller-1 galera-bundle-2@overcloud-controller-2 rabbitmq-bundle-0@overcloud-controller-0 rabbitmq-bundle-1@overcloud-controller-1 rabbitmq-bundle-2@overcloud-controller-2 redis-bundle-0@overcloud-controller-0 redis-bundle-1@overcloud-controller-1 redis-bundle-2@overcloud-controller-2 ] 5

    Full list of resources:
[...]

The main sections of the output show the following information about the cluster:

1
Name of the cluster.
2
Number of nodes that are configured for the cluster.
3
Number of resources that are configured for the cluster.
4
Names of the controller nodes that are currently online.
5
Names of the guest nodes that are currently online. Each guest node consists of a complex Bundle Set resource. For more information about bundle sets, see Section 4.3, “OpenStack Services Configured in Pacemaker”.

4.2. Virtual IP Addresses Configured in Pacemaker

Each IPaddr2 resource sets a virtual IP address that clients use to request access to a service. If the Controller Node that is assigned to that IP address fails, the IP address is reassigned to a different controller.

In this example, you can see each controller node that is currently set to listen to a particular virtual IP address.

 ip-10.200.0.6	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-192.168.1.150	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.16.0.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-1
 ip-172.16.0.11	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-0
 ip-172.18.0.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2
 ip-172.19.0.10	(ocf::heartbeat:IPaddr2):	Started overcloud-controller-2

In the output, each IP address is initially attached to a particular controller. For example, 192.168.1.150 is started on overcloud-controller-0. However, if that controller fails, the IP address is reassigned to other controllers in the cluster.

The following table describes the IP addresses in the example and shows how each address was originally allocated.

Table 4.1. IP address description and allocation source

IP AddressDescriptionAllocated From

192.168.1.150

Public IP address

ExternalAllocationPools attribute in the network-environment.yaml file

10.200.0.6

Controller Virtual IP address

Part of the dhcp_start and dhcp_end range set to 10.200.0.5-10.200.0.24 in the undercloud.conf file

172.16.0.10

Provides access to OpenStack API services on a controller

InternalApiAllocationPools in the network-environment.yaml file

172.18.0.10

Storage Virtual IP address that provides access to the Glance API and to Swift Proxy services

StorageAllocationPools attribute in the network-environment.yaml file

172.16.0.11

Provides access to Redis service on a controller

InternalApiAllocationPools in the network-environment.yaml file

172.19.0.10

Provides access to storage management

StorageMgmtAlloctionPools in the network-environment.yaml file

You can view details about a specific IP address that is managed by Pacemaker with the pcs command. For example, you can view timeout information or netmask ID.

The following example shows the output of the pcs command when you run it on the ip-192.168.1.150 public IP address.

$ sudo pcs resource show ip-192.168.1.150
 Resource: ip-192.168.1.150 (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=192.168.1.150 cidr_netmask=32
  Operations: start interval=0s timeout=20s (ip-192.168.1.150-start-timeout-20s)
              stop interval=0s timeout=20s (ip-192.168.1.150-stop-timeout-20s)
              monitor interval=10s timeout=20s (ip-192.168.1.150-monitor-interval-10s)

If you are logged into the controller that is currently assigned to listen to the IP address 192.168.1.150, you can run the following commands to make sure that the controller is active and that the services are actively listening to that address:

$ ip addr show vlan100
  9: vlan100: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether be:ab:aa:37:34:e7 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.151/24 brd 192.168.1.255 scope global vlan100
       valid_lft forever preferred_lft forever
    inet 192.168.1.150/32 brd 192.168.1.255 scope global vlan100
       valid_lft forever preferred_lft forever

$ sudo netstat -tupln | grep "192.168.1.150.*haproxy"
tcp        0      0 192.168.1.150:8778          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8042          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:9292          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8080          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:80            0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8977          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:6080          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:9696          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8000          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8004          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8774          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:5000          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8776          0.0.0.0:*               LISTEN      61029/haproxy
tcp        0      0 192.168.1.150:8041          0.0.0.0:*               LISTEN      61029/haproxy

The ip command output shows that the vlan100 interface is listening to both the 192.168.1.150 and 192.168.1.151 IPv4 addresses.

The netstat command output shows all processes that are listening to the 192.168.1.150 interface. In addition to the ntpd process that is listening at port 123, the haproxy process is the only other one listening specifically to 192.168.1.150.

NOTE
Processes that are listening to all local addresses, such as 0.0.0.0, are also available through 192.168.1.150. These processes include sshd, mysqld, dhclient, ntpd, and so on.

The port numbers in the netstat command output can help you to identify the specific service that HAProxy is listening for. You can view the /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg file to see which services these port numbers represent.

The following list shows several examples of port numbers and the default services that are assigned to them:

  • TCP port 6080: nova_novncproxy
  • TCP port 9696: neutron
  • TCP port 8000: heat_cfn
  • TCP port 80: horizon
  • TCP port 8776: cinder

Currently, most services that are defined in the haproxy.cfg file listen to the 192.168.1.150 IP address on all three controllers. However, only the controller-0 node is listening externally to the 192.168.1.150 IP address.

Therefore, if the controller-0 node fails, HAProxy only needs to re-assign 192.168.1.150 to another controller and all other services will already be running on the fallback controller node.

4.3. OpenStack Services Configured in Pacemaker

The majority of the services that are managed by the cluster in Red Hat OpenStack Platform 12 and later are configured as Bundle Set resources, or bundles. These services can be started in the same way on each controller node and are set to always run on each controller.

Bundle
A bundle resource handles configuring and replicating the same container on all controller nodes, mapping the necessary storage paths to the container directories, and setting specific attributes related to the resource itself.
Container
A container can run different kind of resources, from simple systemd based services like haproxy to complex services like Galera, which requires specific resource agents that controls and set the state of the service on the different nodes.
Warning
  • Using podman or systemctl commands to manage bundles or containers is not supported. You can use the commands to check the status of the services, but you should use only Pacemaker to perform actions on these services.
  • Podman containers that are controlled by Pacemaker have a RestartPolicy set to no by Podman. This is to ensure that Pacemaker and not Podman controls the container start and stop actions.

4.3.1. Simple Bundle Set resources (simple bundles)

A simple Bundle Set resource, or simple bundle, is a set of containers that each include the same Pacemaker services to be deployed across the controller nodes.

The following example shows the bundle settings from the pcs status command:

Podman container set: haproxy-bundle [192.168.24.1:8787/rhosp15/openstack-haproxy:pcmklatest]
  haproxy-bundle-podman-0      (ocf::heartbeat:podman):        Started overcloud-controller-0
  haproxy-bundle-podman-1      (ocf::heartbeat:podman):        Started overcloud-controller-1
  haproxy-bundle-podman-2      (ocf::heartbeat:podman):        Started overcloud-controller-2

For each bundle, you can see the following details:

  • The name that Pacemaker assigns to the service
  • The reference to the container that is associated with the bundle
  • The list of the replicas that are running on the different controllers with their status

4.3.1.1. Simple bundle settings

To see details about a particular bundle service, such as the haproxy-bundle service, use the pcs resource show command. For example:

$ sudo pcs resource show haproxy-clone
Bundle: haproxy-bundle
 Podman: image=192.168.24.1:8787/rhosp15/openstack-haproxy:pcmklatest network=host options="--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" replicas=3 run-command="/bin/bash /usr/local/bin/kolla_start"
 Storage Mapping:
  options=ro source-dir=/var/lib/kolla/config_files/haproxy.json target-dir=/var/lib/kolla/config_files/config.json (haproxy-cfg-files)
  options=ro source-dir=/var/lib/config-data/puppet-generated/haproxy/ target-dir=/var/lib/kolla/config_files/src (haproxy-cfg-data)
  options=ro source-dir=/etc/hosts target-dir=/etc/hosts (haproxy-hosts)
  options=ro source-dir=/etc/localtime target-dir=/etc/localtime (haproxy-localtime)
  options=ro source-dir=/etc/pki/ca-trust/extracted target-dir=/etc/pki/ca-trust/extracted (haproxy-pki-extracted)
  options=ro source-dir=/etc/pki/tls/certs/ca-bundle.crt target-dir=/etc/pki/tls/certs/ca-bundle.crt (haproxy-pki-ca-bundle-crt)
  options=ro source-dir=/etc/pki/tls/certs/ca-bundle.trust.crt target-dir=/etc/pki/tls/certs/ca-bundle.trust.crt (haproxy-pki-ca-bundle-trust-crt)
  options=ro source-dir=/etc/pki/tls/cert.pem target-dir=/etc/pki/tls/cert.pem (haproxy-pki-cert)
  options=rw source-dir=/dev/log target-dir=/dev/log (haproxy-dev-log)

The haproxy-bundle example also shows the resource settings for HAProxy. Although HAProxy provides high availability services by load-balancing traffic to selected services, you keep HAProxy itself highly available by configuring it as a Pacemaker bundle service.

From the example output, you see that the bundle configures a Podman container with several specific parameters:

  • image: Image used by the container, which refers to the local registry of the undercloud.
  • network: Container network type, which is "host" in the example.
  • options: Specific options for the container.
  • replicas: Number that indicates how many copies of the container should be created in the cluster. Each bundle includes three containers, one for each controller node.
  • run-command: System command used to spawn the container.

In addition to the Podman container specification, the bundle configuration contains also the Storage Mapping section, in which local path on the host are mapped into the container. Therefore, to check the haproxy configuration from the host, you open the /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg file instead of the /etc/haproxy/haproxy.cfg file.

4.3.1.2. Checking simple bundle status

You can check the status of the bundle with the podman command to launch a command inside the container:

$ sudo podman exec -it haproxy-bundle-podman-0 ps -efww | grep haproxy*
root           7       1  0 06:08 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws
haproxy       11       7  0 06:08 ?        00:00:17 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws

The output shows that the process is running inside the container.

You can also check the bundle status directly from the host:

$ ps -ef | grep haproxy*
root       17774   17729  0 06:08 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws
42454      17819   17774  0 06:08 ?        00:00:21 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws
root      288508  237714  0 07:04 pts/0    00:00:00 grep --color=auto haproxy*
[root@controller-0 ~]# ps -ef | grep -e 17774 -e 17819
root       17774   17729  0 06:08 ?        00:00:00 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws
42454      17819   17774  0 06:08 ?        00:00:22 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -Ws
root      301950  237714  0 07:07 pts/0    00:00:00 grep --color=auto -e 17774 -e 17819

You can run the same commands on any bundle to see the current level of activity and details about the commands that the service runs.

4.3.2. Complex Bundle Set resources (complex bundles)

Complex Bundle Set resources, or complex bundles, are Pacemaker services that specify a resource configuration in addition to the basic container configuration, which is also included in simple bundles.

This additional configuration is needed to manage Multi-State resources, which are services that can have different states depending on which controller node they run.

This example shows a list of complex bundles from the output of the pcs status command:

Podman container set: rabbitmq-bundle [192.168.24.1:8787/rhosp15/openstack-rabbitmq:pcmklatest]
  rabbitmq-bundle-0    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-0
  rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-1
  rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):      Started overcloud-controller-2
Podman container set: galera-bundle [192.168.24.1:8787/rhosp15/openstack-mariadb:pcmklatest]
  galera-bundle-0      (ocf::heartbeat:galera):        Master overcloud-controller-0
  galera-bundle-1      (ocf::heartbeat:galera):        Master overcloud-controller-1
  galera-bundle-2      (ocf::heartbeat:galera):        Master overcloud-controller-2
Podman container set: redis-bundle [192.168.24.1:8787/rhosp15/openstack-redis:pcmklatest]
  redis-bundle-0       (ocf::heartbeat:redis): Master overcloud-controller-0
  redis-bundle-1       (ocf::heartbeat:redis): Slave overcloud-controller-1
  redis-bundle-2       (ocf::heartbeat:redis): Slave overcloud-controller-2

In the output, you see that unlike RabbitMQ, the Galera and Redis bundles are run as multi-state resources inside their containers.

For the galera-bundle resource, all three controllers are running as Galera masters. For the redis-bundle resource, the overcloud-controller-0 container is running as the master, while the other two controllers are running as slaves.

This means that the Galera service is running under one set of constraints on all three controllers, while the redis service might run under different constraints on the master and the slave controllers.

The following example shows the output of the pcs resource show galera-bundle command:

[...]
Bundle: galera-bundle
 Podman: image=192.168.24.1:8787/rhosp15/openstack-mariadb:pcmklatest masters=3 network=host options="--user=root --log-driver=journald -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS" replicas=3 run-command="/bin/bash /usr/local/bin/kolla_start"
 Network: control-port=3123
 Storage Mapping:
  options=ro source-dir=/var/lib/kolla/config_files/mysql.json target-dir=/var/lib/kolla/config_files/config.json (mysql-cfg-files)
  options=ro source-dir=/var/lib/config-data/puppet-generated/mysql/ target-dir=/var/lib/kolla/config_files/src (mysql-cfg-data)
  options=ro source-dir=/etc/hosts target-dir=/etc/hosts (mysql-hosts)
  options=ro source-dir=/etc/localtime target-dir=/etc/localtime (mysql-localtime)
  options=rw source-dir=/var/lib/mysql target-dir=/var/lib/mysql (mysql-lib)
  options=rw source-dir=/var/log/mariadb target-dir=/var/log/mariadb (mysql-log-mariadb)
  options=rw source-dir=/dev/log target-dir=/dev/log (mysql-dev-log)
 Resource: galera (class=ocf provider=heartbeat type=galera)
  Attributes: additional_parameters=--open-files-limit=16384 cluster_host_map=overcloud-controller-0:overcloud-controller-0.internalapi.localdomain;overcloud-controller-1:overcloud-controller-1.internalapi.localdomain;overcloud-controller-2:overcloud-controller-2.internalapi.localdomain enable_creation=true wsrep_cluster_address=gcomm://overcloud-controller-0.internalapi.localdomain,overcloud-controller-1.internalapi.localdomain,overcloud-controller-2.internalapi.localdomain
  Meta Attrs: container-attribute-target=host master-max=3 ordered=true
  Operations: demote interval=0s timeout=120 (galera-demote-interval-0s)
              monitor interval=20 timeout=30 (galera-monitor-interval-20)
              monitor interval=10 role=Master timeout=30 (galera-monitor-interval-10)
              monitor interval=30 role=Slave timeout=30 (galera-monitor-interval-30)
              promote interval=0s on-fail=block timeout=300s (galera-promote-interval-0s)
              start interval=0s timeout=120 (galera-start-interval-0s)
              stop interval=0s timeout=120 (galera-stop-interval-0s)
[...]

This output shows that unlike in a simple bundle, the galera-bundle resource includes explicit resource configuration, which determines all aspects of the multi-state resource.

Note

Even though a service might be running on multiple controllers at the same time, the controller itself might not be listening at the IP address that is needed to actually reach those services.

For more information about troubleshooting the Galera resource, see Chapter 6, Using Galera.

4.4. Pacemaker Failed Actions

If any of the resources fail in any way, they will be listed under the Failed actions heading of the pcs status output. In the following example, the openstack-cinder-volume service stopped working on controller-0:

Failed Actions:
* openstack-cinder-volume_monitor_60000 on overcloud-controller-0 'not running' (7): call=74, status=complete, exitreason='none',
	last-rc-change='Wed Dec 14 08:33:14 2016', queued=0ms, exec=0ms

In this case, the systemd service openstack-cinder-volume needs to be re-enabled. In other cases, you need to track down and fix the problem, then clean up the resources. See Section 7.1, “Correcting Resource Problems on Controllers” for details.

4.5. Other Pacemaker Information for Controllers

The last sections of the pcs status output shows information about your power management fencing (IPMI in this case) and the status of the Pacemaker service itself:

 my-ipmilan-for-controller-0	(stonith:fence_ipmilan): Started my-ipmilan-for-controller-0
 my-ipmilan-for-controller-1	(stonith:fence_ipmilan): Started my-ipmilan-for-controller-1
 my-ipmilan-for-controller-2	(stonith:fence_ipmilan): Started my-ipmilan-for-controller-2

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0

  pcsd: active/enabled

The my-ipmilan-for-controller settings show the type of fencing done for each node (stonith:fence_ipmilan) and whether or not the IPMI service is stopped or running. The PCSD Status shows that all three controllers are currently online. The Pacemaker service itself consists of three daemons: corosync, pacemaker, and pcsd. Here, all three services are active and enabled.

4.6. Fencing Hardware

When a controller node fails a health check, the controller acting as the Pacemaker designated coordinator (DC) uses the Pacemaker stonith service to fence off the offending node. Stonith is an acronym for the term "Shoot the other node in the head". So, the DC basically kicks the node out of the cluster.

To see how your fencing devices are configured by stonith for your OpenStack Platform HA cluster, run the following command:

$ sudo pcs stonith show --full
 Resource: my-ipmilan-for-controller-0 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-0 ipaddr=10.100.0.51 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-0-monitor-interval-60s)
 Resource: my-ipmilan-for-controller-1 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-1 ipaddr=10.100.0.52 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-1-monitor-interval-60s)
 Resource: my-ipmilan-for-controller-2 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-2 ipaddr=10.100.0.53 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-2-monitor-interval-60s)

The show --full listing shows details about the three controller nodes that relate to fencing. The fence device uses IPMI power management (fence_ipmilan) to turn the machines on and off as required. Information about the IPMI interface for each node includes the IP address of the IPMI interface (10.100.0.51), the user name to log in as (admin) and the password to use (abc). You can also see the interval at which each host is monitored (60 seconds).

For more information on fencing with Pacemaker, see "Configuring fencing in a Red Hat High Availability cluster" in the Red Hat Enterprise Linux 8 Configuring and Managing High Availability Clusters guide.