Chapter 5. Performing maintenance on Compute nodes and Controller nodes with Instance HA

If you must perform maintenance on a Compute node or a Controller node with Instance HA, you stop the node by setting it in standby mode and disabling the Pacemaker resources on the node. After you complete the maintenance work, you start the node and check that the Pacemaker resources are healthy.

Prerequisites

  • A running overcloud with Instance HA enabled

Procedure

  1. Log in to a Controller node and stop the Compute or Controller node:

    # pcs node standby <node UUID>

    You must log in to a different node from the node you want to stop.

  2. Disable the Pacemaker resources on the node:

    # pcs resource disable <ocf::pacemaker:remote on the node>
  3. Perform any maintenance work on the node.
  4. Restore the IPMI connection and start the node. Wait until the node is ready before proceeding.
  5. Enable the Pacemaker resources on the node and start the node:

    # pcs resource enable <ocf::pacemaker:remote on the node>
    # pcs node unstandby <node UUID>
  6. If you set the node to maintenance mode, source the credential file for your overcloud and unset the node from maintenance mode:

    # source stackrc
    # openstack baremetal node maintenance unset <baremetal node UUID>

Verification

  1. Check that the Pacemaker resources are active and healthy:

    # pcs status
  2. If any Pacemaker resources fail to start during the startup process, run the pcs resource cleanup command to reset the status and the fail count of the resource.
  3. If you evacuated instances from a Compute node before you stopped the node, check that the instances are migrated to a different node:

    # openstack server list --long
    # nova migration-list