Chapter 16. Scaling overcloud nodes


Do not use openstack server delete to remove nodes from the overcloud. Follow the procedures in this section to remove and replace nodes correctly.

If you want to add or remove nodes after the creation of the overcloud, you must update the overcloud.


Ensure that your bare metal nodes are not in maintenance mode before you begin scaling out or removing an overcloud node.

Use the following table to determine support for scaling each node type:

Table 16.1. Scale support for each node type

Node type

Scale up?

Scale down?





You can replace Controller nodes using the procedures in Chapter 17, Replacing Controller nodes.





Ceph Storage nodes



You must have at least 1 Ceph Storage node from the initial overcloud creation.

Object Storage nodes




Ensure that you have at least 10 GB free space before you scale the overcloud. This free space accommodates image conversion and caching during the node provisioning process.

16.1. Adding nodes to the overcloud

Complete the following steps to add more nodes to the director node pool.


  1. Create a new JSON file (newnodes.json) that contains details of the new node that you want to register:

  2. Run the following command to register the new nodes:

    $ source ~/stackrc
    (undercloud) $ openstack overcloud node import newnodes.json
  3. After you register the new nodes, run the following commands to launch the introspection process for each new node:

    (undercloud) $ openstack overcloud node introspect [NODE UUID] --provide

    This process detects and benchmarks the hardware properties of the nodes.

  4. Configure the image properties for the node:

    (undercloud) $ openstack overcloud node configure [NODE UUID]

16.2. Increasing node counts for roles

Complete the following steps to scale overcloud nodes for a specific role, such as a Compute node.


  1. Tag each new node with the role you want. For example, to tag a node with the Compute role, run the following command:

    (undercloud) $ openstack baremetal node set --property capabilities='profile:compute,boot_option:local' [NODE UUID]
  2. To scale the overcloud, you must edit the environment file that contains your node counts and re-deploy the overcloud. For example, to scale your overcloud to 5 Compute nodes, edit the ComputeCount parameter:

      ComputeCount: 5
  3. Rerun the deployment command with the updated file, which in this example is called node-info.yaml:

    (undercloud) $ openstack overcloud deploy --templates -e /home/stack/templates/node-info.yaml [OTHER_OPTIONS]

    Ensure that you include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.

  4. Wait until the deployment operation completes.

16.3. Removing Compute nodes

There might be situations where you need to remove Compute nodes from the overcloud. For example, you might need to replace a problematic Compute node.


Before you remove a Compute node from the overcloud, migrate the workload from the node to other Compute nodes. For more information, see Migrating virtual machine instances between Compute nodes.


  • If Instance HA is enabled, choose one of the following options:

    • If the Compute node is accessible, log in to the Compute node as the root user and perform a clean shutdown with the shutdown -h now command.
    • If the Compute node is not accessible, log in to a Controller node as the root user, disable the STONITH device for the Compute node, and shut down the bare metal node:

      [root@controller-0 ~]# pcs stonith disable <stonith_resource_name>
      [stack@undercloud ~]$ source stackrc
      [stack@undercloud ~]$ openstack baremetal node power off <UUID>


  1. Source the overcloud configuration:

    $ source ~/overcloudrc
  2. Disable the Compute service on the outgoing node on the overcloud to prevent the node from scheduling new instances:

    (overcloud)$ openstack compute service list
    (overcloud)$ openstack compute service set <hostname> nova-compute --disable

    Use the --disable-reason option to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service at a later point.

  3. Source the undercloud configuration:

    (overcloud)$ source ~/stackrc
  4. Identify the UUID of the overcloud stack:

    (undercloud)$ openstack stack list
  5. Identify the UUIDs or hostnames of the nodes that you want to delete:

    (undercloud)$ openstack server list
  6. Optional: Run the overcloud deploy command with the --update-plan-only option to update the plans with the most recent configurations from the templates. This ensures that the overcloud configuration is up-to-date before you delete any Compute nodes.

    $ openstack overcloud deploy --update-plan-only \
      --templates  \
      -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
      -e /home/stack/templates/network-environment.yaml \
      -e /home/stack/templates/storage-environment.yaml \
      -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \
      [-e |...]

    This step is required if you updated the overcloud node blacklist. For more information about blacklisting overcloud nodes, see Blacklisting nodes.

  7. Delete the Compute nodes from the stack:

    $ openstack overcloud node delete --stack <overcloud> \
     <node_1> ... [node_n]
    • Replace <overcloud> with the name or UUID of the overcloud stack.
    • Replace <node1>, and optionally all nodes up to [node_n], with the Compute service hostname or UUID of the Compute nodes you want to delete. Do not use a mix of UUIDs and hostnames. Use either only UUIDs or only hostnames.


      If the node has already been powered off, this command returns a WARNING message:

      Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log
      WARNING: Scale-down configuration error. Manual cleanup of some actions may be necessary. Continuing with node removal.

      You can ignore this message.

  8. Ensure that the openstack overcloud node delete command runs to completion:

    (undercloud)$ openstack stack list

    The status of the overcloud stack shows UPDATE_COMPLETE when the delete operation is complete.

    1. If the IPMI interface of the node that you want to remove is not reachable, the openstack overcloud node delete command fails and the stack is in an UPDATE_FAILED status. Move the node to maintenance mode and re-run the openstack overcloud node delete command:

      $ openstack baremetal node maintenance set <NODE_ID>
    2. Adjust the Compute count and rerun the openstack overcloud deploy command that you used to deploy the existing overcloud.


      If you intend to redeploy the Compute service with the same host name, you must use the existing service records for the redeployed node. If this is the case, skip the remaining steps in this procedure, and proceed with the instructions detailed in Redeploying the Compute service with the same host name.

  9. Check the network agents in your overcloud environment:

    (overcloud)$ openstack network agent list
  10. If any agents appear for the old node, remove them:

    (overcloud)$ for AGENT in $(openstack network agent list --host <scaled_down_node> -c ID -f value) ; do openstack network agent delete $AGENT ; done

    Replace <scaled_down_node> with the name of the node to remove.


    In an ML2/OVN deployment, known issues prevent removal of the OVN controller and metadata agents. To track progress on these issues, see BZ1828889 and BZ1738554.

  11. Decrease the ComputeCount parameter in the environment file that contains your node counts. This file is usually named node-info.yaml. For example, decrease the node count from five nodes to three nodes if you removed two nodes:

      ComputeCount: 3

    Decreasing the node count ensures that director does not provision any new nodes when you run openstack overcloud deploy.

  12. If Instance HA is enabled, perform the following actions:

    1. Clean up the Pacemaker resources for the node:

      $ sudo pcs resource delete <scaled_down_node>
      $ sudo cibadmin -o nodes --delete --xml-text '<node id="<scaled_down_node>"/>'
      $ sudo cibadmin -o fencing-topology --delete --xml-text '<fencing-level target="<scaled_down_node>"/>'
      $ sudo cibadmin -o status --delete --xml-text '<node_state id="<scaled_down_node>"/>'
      $ sudo cibadmin -o status --delete-all --xml-text '<node id="<scaled_down_node>"/>' --force
    2. Delete the STONITH device for the node:

      $ sudo pcs stonith delete <device-name>

You can remove the node from the overcloud and re-provision it for other purposes.

Redeploying the Compute service with the same host name

To redeploy a disabled Compute service, re-enable it after you redeploy a Compute node with the same host name.


  1. Remove the deleted Compute service as a resource provider from the Placement service:

    (undercloud)$ source ~/overcloudrc
    (overcloud)$ openstack resource provider list
    (overcloud)$ openstack resource provider delete <uuid>
  2. Check the status of the Compute service:

    (overcloud)$ openstack compute service list --long
    | ID | Binary       | Host                  | Zone  | Status   | State | Updated At                 | Disabled Reason      |
    | 80 | nova-compute | compute-1.localdomain | nova  | disabled | up    | 2018-07-13T14:35:04.000000 | gets re-provisioned |
  3. When the service state of the redeployed Compute node changes to up, re-enable the service:

    (overcloud)$ openstack compute service set compute-1.localdomain nova-compute --enable

16.4. Replacing Ceph Storage nodes

You can use director to replace Ceph Storage nodes in a director-created cluster. For more information, see the Deploying an Overcloud with Containerized Red Hat Ceph guide.

16.5. Replacing Object Storage nodes

Follow the instructions in this section to understand how to replace Object Storage nodes without impact to the integrity of the cluster. This example involves a three-node Object Storage cluster in which you want to replace the node overcloud-objectstorage-1 node. The goal of the procedure is to add one more node and then remove the overcloud-objectstorage-1 node. The new node replaces the overcloud-objectstorage-1 node.


  1. Increase the Object Storage count using the ObjectStorageCount parameter. This parameter is usually located in node-info.yaml, which is the environment file that contains your node counts:

      ObjectStorageCount: 4

    The ObjectStorageCount parameter defines the quantity of Object Storage nodes in your environment. In this example, scale the quantity of Object Storage nodes from 3 to 4.

  2. Run the deployment command with the updated ObjectStorageCount parameter:

    $ source ~/stackrc
    (undercloud) $ openstack overcloud deploy --templates -e node-info.yaml <environment_files>

    After the deployment command completes, the overcloud contains an additional Object Storage node.

  3. Replicate data to the new node. Before you remove a node, in this case, overcloud-objectstorage-1, wait for a replication pass to finish on the new node. Check the replication pass progress in the /var/log/swift/swift.log file. When the pass finishes, the Object Storage service should log entries similar to the following example:

    Mar 29 08:49:05 localhost object-server: Object replication complete.
    Mar 29 08:49:11 localhost container-server: Replication run OVER
    Mar 29 08:49:13 localhost account-server: Replication run OVER
  4. To remove the old node from the ring, reduce the ObjectStorageCount parameter to omit the old node. In this example, reduce the ObjectStorageCount parameter to 3:

      ObjectStorageCount: 3
  5. Create a new environment file named remove-object-node.yaml. This file identifies and removes the specified Object Storage node. The following content specifies the removal of overcloud-objectstorage-1:

        [{'resource_list': ['1']}]
  6. Include both the node-info.yaml and remove-object-node.yaml files in the deployment command:

    (undercloud) $ openstack overcloud deploy --templates -e node-info.yaml <environment_files> -e remove-object-node.yaml

Director deletes the Object Storage node from the overcloud and updates the rest of the nodes on the overcloud to accommodate the node removal.


Include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.

16.6. Blacklisting nodes

You can exclude overcloud nodes from receiving an updated deployment. This is useful in scenarios where you want to scale new nodes and exclude existing nodes from receiving an updated set of parameters and resources from the core heat template collection. This means that the blacklisted nodes are isolated from the effects of the stack operation.

Use the DeploymentServerBlacklist parameter in an environment file to create a blacklist.

Setting the blacklist

The DeploymentServerBlacklist parameter is a list of server names. Write a new environment file, or add the parameter value to an existing custom environment file and pass the file to the deployment command:

    - overcloud-compute-0
    - overcloud-compute-1
    - overcloud-compute-2

The server names in the parameter value are the names according to OpenStack Orchestration (heat), not the actual server hostnames.

Include this environment file with your openstack overcloud deploy command:

$ source ~/stackrc
(undercloud) $ openstack overcloud deploy --templates \
  -e server-blacklist.yaml \

Heat blacklists any servers in the list from receiving updated heat deployments. After the stack operation completes, any blacklisted servers remain unchanged. You can also power off or stop the os-collect-config agents during the operation.

  • Exercise caution when you blacklist nodes. Only use a blacklist if you fully understand how to apply the requested change with a blacklist in effect. It is possible to create a hung stack or configure the overcloud incorrectly when you use the blacklist feature. For example, if cluster configuration changes apply to all members of a Pacemaker cluster, blacklisting a Pacemaker cluster member during this change can cause the cluster to fail.
  • Do not use the blacklist during update or upgrade procedures. Those procedures have their own methods for isolating changes to particular servers.
  • When you add servers to the blacklist, further changes to those nodes are not supported until you remove the server from the blacklist. This includes updates, upgrades, scale up, scale down, and node replacement. For example, when you blacklist existing Compute nodes while scaling out the overcloud with new Compute nodes, the blacklisted nodes miss the information added to /etc/hosts and /etc/ssh/ssh_known_hosts. This can cause live migration to fail, depending on the destination host. The Compute nodes are updated with the information added to /etc/hosts and /etc/ssh/ssh_known_hosts during the next overcloud deployment where they are no longer blacklisted. Do not modify the /etc/hosts and /etc/ssh/ssh_known_hosts files manually. To modify the /etc/hosts and /etc/ssh/ssh_known_hosts files, run the overcloud deploy command as described in the Clearing the Blacklist section.

Clearing the blacklist

To clear the blacklist for subsequent stack operations, edit the DeploymentServerBlacklist to use an empty array:

  DeploymentServerBlacklist: []

Do not omit the DeploymentServerBlacklist parameter. If you omit the parameter, the overcloud deployment uses the previously saved value.