High Availability for Compute Instances
Configure High Availability for Compute Instances
Chapter 1. Overview
This guide describes how to manage Instance High Availability (Instance HA). Instance HA allows Red Hat OpenStack Platform to automatically evacuate and re-spawn instances on a different Compute node when their host Compute node fails.
The evacuation process that is triggered by Instance HA is similar to what users can do manually, as described in Evacuate Instances.
Instance HA works on shared storage or local storage environments, which means that evacuated instances maintain the same network configuration (static IP, floating IP, and so on) and the same characteristics inside the new host, even if they are spawned from scratch.
Instance HA is managed by the following resource agents:
|Agent name||Name inside cluster||Role|
| || || |
Marks a Compute node for evacuation when the node becomes unavailable.
| || || |
Evacuates instances from failed nodes. This agent runs on one of the Controller nodes.
| || || |
Releases a fenced node and enables the node to run instances again.
Chapter 2. How Instance HA Works
OpenStack uses Instance HA to automate the process of evacuating instances from a Compute node when that node fails. The following procedure describes the sequence of events that are triggered when a Compute node fails.
At the time of failure, the
IPMIagent performs first-layer fencing and physically resets the node to ensure that it is powered off. Evacuating instances from online Compute nodes might result in data corruption or in multiple identical instances running on the overcloud. When the node is powered off, it is considered fenced.
After the physical IPMI fencing, the
fence-novaagent performs second-layer fencing and marks the fenced node with the
“evacuate=yes”cluster per-node attribute. To do this, the agent runs the following command:
$ attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"
Where FAILEDHOST is the hostname of the failed Compute node.
nova-evacuateagent continually runs in the background, periodically checking the cluster for nodes with the
nova-evacuatedetects that the fenced node contains this attribute, the agent starts evacuating the node using the process described in Evacuate Instances.
While the failed node boots up from the IPMI reset, the
nova-computeprocess on that node starts automatically. Because the node was fenced earlier, it does not run any new instances until Pacemaker unfences it.
When Pacemaker sees that the Compute node is online again, it tries to start the
compute-unfence-triggerresource on the node, reverting the force-down API call and setting the node as enabled again.
2.1. Designating specific instances to be evacuated
By default, all instances are to be evacuated, but it is also possible to tag images or flavors for evacuation.
To tag an image:
$ openstack image set --tag evacuable ID-OF-THE-IMAGE
To tag a flavor:
$ nova flavor-key ID-OF-THE-FLAVOR set evacuable=true
Chapter 3. Installing and configuring Instance HA
From Red Hat OpenStack Platform 13 and later, Instance HA is deployed and configured with the director. However, there are a few additional steps that you need to perform to prepare for the deployment.
You can configure Instance HA for your overcloud at any time after creating the undercloud.
ComputeInstanceHArole to your roles data file and regenerate the file. For example:
$ openstack overcloud roles generate -o ~/my_roles_data.yaml Controller Compute ComputeInstanceHA
ComputeInstanceHArole includes all the services in the default
Computerole as well as the
PacemakerRemoteservices. For general information about custom roles and about the roles-data.yaml, see the Roles section in the Advanced Overcloud Customization guide.
compute-instance-haflavor to tag Compute nodes that you want to designate for Instance HA. For example:
$ source ~/stackrc $ openstack flavor create --id auto --ram 6144 --disk 40 --vcpus 4 compute-instance-ha $ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute-instance-ha" compute-instance-ha
Tag each Compute node that you want to designate for Instance HA with the
$ openstack baremetal node set --property capabilities='profile:compute-instance-ha,boot_option:local' <NODE UUID>
ComputeInstanceHArole to the
compute-instance-haflavor by creating an environment file with the following content:
parameter_defaults: OvercloudComputeInstanceHAFlavor: compute-instance-ha
Enable fencing on all Controller and Compute nodes in the overcloud by creating an environment file with fencing information. Make sure to create the environment file in an accessible location, such as ~/templates. For example:
parameter_defaults: EnableFencing: true FencingConfig: devices: - agent: fence_ipmilan host_mac: 00:ec:ad:cb:3c:c7 params: login: admin ipaddr: 192.168.24.1 ipport: 6230 passwd: password lanplus: 1 - agent: fence_ipmilan host_mac: 00:ec:ad:cb:3c:cb params: login: admin ipaddr: 192.168.24.1 ipport: 6231 passwd: password lanplus: 1 - agent: fence_ipmilan host_mac: 00:ec:ad:cb:3c:cf params: login: admin ipaddr: 192.168.24.1 ipport: 6232 passwd: password lanplus: 1 - agent: fence_ipmilan host_mac: 00:ec:ad:cb:3c:d3 params: login: admin ipaddr: 192.168.24.1 ipport: 6233 passwd: password lanplus: 1 - agent: fence_ipmilan host_mac: 00:ec:ad:cb:3c:d7 params: login: admin ipaddr: 192.168.24.1 ipport: 6234 passwd: password lanplus: 1
For more information about fencing and STONITH configuration, see the Fencing the Controller Nodes section of the Advanced Overcloud Customization guide.
By default, Instance HA uses shared storage. If you already defined shared storage separately, you can disable shared storage by adding a parameter to the environment file that you created in the previous step.
parameter_defaults: ExtraConfig: tripleo::instanceha::no_shared_storage: false
openstack overcloud deploycommand with the
-eoption and specify all the environment files that you created, as well as the compute-instanceha.yaml environment file. For example:
$ openstack overcloud deploy -e <FLAVOR_ENV_FILE> <FENCING_ENV_FILE> /usr/share/openstack-tripleo-heat-templates/environments/compute-instanceha.yamlNote
Do not modify the compute-instanceha.yaml environment file.
After the deployment is complete, each Compute node should include a
STONITH device and a
Chapter 4. Testing Evacuation with Instance HA
The following procedure involves deliberately crashing a Compute node. Doing this forces the automated evacuation of instances through Instance HA.
Boot one or more instances on the overcloud before crashing the Compute node that hosts the instances to test.
stack@director $ . overcloudrc stack@director $ nova boot --image cirros --flavor 2 test-failover stack@director $ nova list --fields name,status,host
Log in to the Compute node that hosts the instances, using the
stack@director $ . stackrc stack@director $ ssh -l heat-admin compute-n heat-admin@compute-n $
Crash the Compute node.
heat-admin@compute-n $ echo c > /proc/sysrq-trigger
Wait a few minutes and then verify that these instances re-spawned on another Compute nodes.
stack@director $ nova list --fields name,status,host stack@director $ nova service-list
Chapter 5. Disabling Instance HA from previous versions
If you upgrade to Red Hat OpenStack Platform 13 from previous versions, you must manually disable Instance HA before you upgrade. This includes major and minor upgrades, as well as fast-forward upgrades.
To disable Instance HA, run the following command as the
stack user on the undercloud:
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \ -e release="rhos-12" -e instance_ha_action="uninstall"
If you used the
stonith_devices option when you enabled Instance HA, you need to specify this option when you disable Instance HA. For example, if your Instance HA configuration excludes STONITH devices, use the following command syntax:
stack@director $ ansible-playbook /home/stack/ansible-instanceha/playbooks/overcloud-instance-ha.yml \ -e release="rhos-12" -e instance_ha_action="uninstall" -e stonith_devices=”none”