Chapter 2. How Instance HA Works
OpenStack uses Instance HA to automate the process of evacuating instances from a Compute node when that node fails. The following procedure describes the sequence of events that are triggered when a Compute node fails.
-
At the time of failure, the
IPMIagent performs first-layer fencing and physically resets the node to ensure that it is powered off. Evacuating instances from online Compute nodes might result in data corruption or in multiple identical instances running on the overcloud. When the node is powered off, it is considered fenced. After the physical IPMI fencing, the
fence-novaagent performs second-layer fencing and marks the fenced node with the“evacuate=yes”cluster per-node attribute. To do this, the agent runs the following command:$ attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"Where FAILEDHOST is the hostname of the failed Compute node.
NoteBy default, all instances are to be evacuated, but it is also possible to tag images or flavors for evacuation.
To tag an image:
$ openstack image set --tag evacuable ID-OF-THE-IMAGETo tag a flavor:
$ nova flavor-key ID-OF-THE-FLAVOR set evacuable=true-
The
nova-evacuateagent continually runs in the background, periodically checking the cluster for nodes with the“evacuate=yes”attribute. Whennova-evacuatedetects that the fenced node contains this attribute, the agent starts evacuating the node using the process described in Evacuate Instances. -
While the failed node is booting up from the IPMI reset, the
nova-computeprocess on that node will start automatically. Because the node was fenced earlier, it will not be able to run any new instance until Pacemaker unfences it. -
When Pacemaker sees that the Compute node is online again, it tries to start the
compute-unfence-triggerresource on the node, reverting the force-down API call and setting the node as enabled again.
