Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Appendix A. Automated Evacuation Through Instance HA

With Instance HA, OpenStack automates the process of evacuating instances from a Compute node when that node fails. The following process describes the sequence of events triggered in the event of a Compute node failure.

  1. When a Compute node fails, the IPMI agent performs first-level fencing and physically resets the node to ensure that it is powered off. Evacuating instances from online Compute nodes could result in data corruption or multiple identical instances running on the overcloud. Once the node is powered off, it is considered fenced.
  2. After the physical IPMI fencing, the fence-nova agent performs second-level fencing and marks the fenced node with the “evacuate=yes” cluster per-node attribute. To do this, the agent runs:

    $ attrd_updater -n evacuate -A name="evacuate" host="FAILEDHOST" value="yes"

    Where FAILEDHOST is the hostname of the failed Compute node.

  3. The nova-evacuate agent constantly runs in the background, periodically checking the cluster for nodes with the “evacuate=yes” attribute. Once nova-evacuate detects that the fenced node has this attribute, the agent starts evacuating the node using the same process as described in Evacuate Instances.
  4. Meanwhile, while the failed node is booting up from the IPMI reset, the nova-compute-checkevacuate agent will wait (by default, for 120 seconds) before checking whether nova-evacuate is finished with evacuation. If not, it will check again after the same time interval.
  5. Once nova-compute-checkevacuate verifies that the instances are fully evacuated, it triggers another process to make the fenced node available again for hosting instances.