Language:
Format:

Chapter 16. Scaling overcloud nodes

If you want to add or remove nodes after the creation of the overcloud, you must update the overcloud.

Warning

Do not use openstack server delete to remove nodes from the overcloud. Follow the procedures in this section to remove and replace nodes correctly.

Note

Ensure that your bare metal nodes are not in maintenance mode before you begin scaling out or removing an overcloud node.

Use the following table to determine support for scaling each node type:

Table 16.1. Scale support for each node type

Node type	Scale up?	Scale down?	Notes
Controller	N	N	You can replace Controller nodes using the procedures in Chapter 17, Replacing Controller nodes.
Compute	Y	Y
Ceph Storage nodes	Y	N	You must have at least 1 Ceph Storage node from the initial overcloud creation.
Object Storage nodes	Y	Y

Important

Ensure that you have at least 10 GB free space before you scale the overcloud. This free space accommodates image conversion and caching during the node provisioning process.

16.1. Adding nodes to the overcloud

Complete the following steps to add more nodes to the director node pool.

Note

A fresh installation of Red Hat OpenStack Platform does not include certain updates, such as security errata and bug fixes. As a result, if you are scaling up a connected environment that uses the Red Hat Customer Portal or Red Hat Satellite Server, RPM updates are not applied to new nodes. To apply the latest updates to the overcloud nodes, you must do one of the following:

Complete an overcloud update of the nodes after the scale-out operation.
Use the virt-customize tool to modify the packages to the base overcloud image before the scale-out operation. For more information, see the Red Hat Knowledgebase solution Modifying the Red Hat Linux OpenStack Platform Overcloud Image with virt-customize.

Procedure

Create a new JSON file called newnodes.json that contains details of the new node that you want to register:

{
  "nodes":[
    {
        "mac":[
            "dd:dd:dd:dd:dd:dd"
        ],
        "cpu":"4",
        "memory":"6144",
        "disk":"40",
        "arch":"x86_64",
        "pm_type":"ipmi",
        "pm_user":"admin",
        "pm_password":"p@55w0rd!",
        "pm_addr":"192.168.24.207"
    },
    {
        "mac":[
            "ee:ee:ee:ee:ee:ee"
        ],
        "cpu":"4",
        "memory":"6144",
        "disk":"40",
        "arch":"x86_64",
        "pm_type":"ipmi",
        "pm_user":"admin",
        "pm_password":"p@55w0rd!",
        "pm_addr":"192.168.24.208"
    }
  ]
}

$ source ~/stackrc
$ openstack overcloud node import newnodes.json

After you register the new nodes, launch the introspection process for each new node:
```
$ openstack overcloud node introspect <node_UUID> --provide
```
- Replace <node_UUID> with the UUID of the node to add. This process detects and benchmarks the hardware properties of the nodes.

Configure the image properties for the node:

$ openstack overcloud node configure <node_UUID>

16.2. Increasing node counts for roles

Complete the following steps to scale overcloud nodes for a specific role, such as a Compute node.

Procedure

Tag each new node with the role you want. For example, to tag a node with the Compute role, run the following command:
```
$ openstack baremetal node set --property capabilities='profile:compute,boot_option:local' <node_UUID>
```
- Replace <node_UUID> with the UUID of the node to tag.
To scale the overcloud, you must edit the environment file that contains your node counts and re-deploy the overcloud. For example, to scale your overcloud to 5 Compute nodes, edit the ComputeCount parameter:
```
parameter_defaults:
  ...
  ComputeCount: 5
  ...
```
Rerun the deployment command with the updated file, which in this example is called node-info.yaml:
```
$ openstack overcloud deploy --templates \
  -e /home/stack/templates/node-info.yaml \
  -e [..]
```
Ensure that you include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.
Wait until the deployment operation completes.

16.3. Removing or replacing Compute nodes

In some situations you need to remove a Compute node from the overcloud. For example, you might need to replace a problematic Compute node or remove a group of Compute nodes to scale down your cloud. When you delete a Compute node the node’s index is added by default to the blocklist to prevent the index being reused during scale out operations.

You can replace a removed Compute node after you have removed the node from your overcloud deployment.

Prerequisites

The Compute service is disabled on the nodes that you want to remove to prevent the nodes from scheduling new instances. To confirm that the Compute service is disabled, use the following command to list the compute services:
```
(overcloud)$ openstack compute service list
```
If the Compute service is not disabled then disable the Compture service:
```
(overcloud)$ openstack compute service set <hostname> nova-compute --disable
```
Replace <hostname> with the hostname of the Compute node to disable.
Tip
Use the --disable-reason option to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service.
The workloads on the Compute nodes have been migrated to other Compute nodes. For more information, see Migrating virtual machine instances between Compute nodes.
If Instance HA is enabled, choose one of the following options:
- If the Compute node is accessible, log in to the Compute node as the root user and perform a clean shutdown with the shutdown -h now command.
- If the Compute node is not accessible, log in to a Controller node as the root user, disable the STONITH device for the Compute node, and shut down the bare metal node:
```
$ sudo pcs stonith disable <compute_UUID>
```
- Source the stackrc undercloud credentials file and power off the baremetal node:
```
$ source ~/stackrc
(undercloud)$ openstack baremetal node power off <compute_UUID>
```
Replace <compute_UUID> with the UUID of the Compute node to remove.

Procedure

Source the stackrc undercloud credentials file:
```
$ source ~/stackrc
```
Identify the name of the overcloud stack:
```
(undercloud)$ openstack stack list
```
Identify the UUIDs or hostnames of the Compute nodes that you want to delete:
```
(undercloud)$ openstack server list
```
Optional: Run the overcloud deploy command with the --update-plan-only option to update the plans with the most recent configurations from the templates. This ensures that the overcloud configuration is up-to-date before you delete any Compute nodes:
```
(undercloud)$ openstack overcloud deploy --stack <overcloud> --update-plan-only \
  --templates  \
  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
  -e /home/stack/templates/network-environment.yaml \
  -e /home/stack/templates/storage-environment.yaml \
  -e /home/stack/templates/rhel-registration/environment-rhel-registration.yaml \
  -e [...]
```
Replace <overcloud> with the name of the overcloud stack.
Note
You must update the overcloud plans if you updated the overcloud node blocklist. For more information about adding overcloud nodes to the blocklist, see Blocklisting nodes.
Delete the Compute nodes from the stack:
```
(undercloud)$ openstack overcloud node delete --stack <overcloud> \
 <node_1> ... [node_n]
```
- Replace <overcloud> with the name of the overcloud stack.
- Replace <node_1>, and optionally all nodes up to [node_n], with the Compute service hostname or UUID of the Compute nodes you want to delete. Do not use a mix of UUIDs and hostnames. Use either only UUIDs or only hostnames.
  Note
  If the node has already been powered off, this command returns a WARNING message:
  Ansible failed, check log at /var/lib/mistral/overcloud/ansible.log WARNING: Scale-down configuration error. Manual cleanup of some actions may be necessary. Continuing with node removal.
  To address the issues caused by the powered off node, nodes manually, complete steps 1 through 8 in Completing the removal of an unreachable Compute node, and then proceed with the next step in this procedure.
Wait until the Compute nodes are deleted.

Delete the network agents for each node that you deleted:

(undercloud)$ source ~/overcloudrc
(overcloud)$ for AGENT in $(openstack network agent list \
  --host <scaled_down_node> -c ID -f value) ; \
  do openstack network agent delete $AGENT ; done

Replace <scaled down node> with the hostname of the node that you deleted.

Check the command output. Because of a bug in RHOSP 16.1.7 and older, you might see a message indicating that the agents could not be deleted.
```
Bad agent request: OVN agents cannot be deleted.
```
If you do not see a Bad agent request message, proceed to the next step.
If you see a Bad agent request message, go to Deleting the network agents: workaround for bug. After completing the workaround procedure, return here and proceed to the next step.

Check the status of the overcloud stack when the node deletion is complete:

(overcloud)$ source ~/stackrc
(undercloud)$ openstack stack list

Table 16.2. Result

Status Description

Status	Description
`UPDATE_COMPLETE`	The Compute node deletion completed successfully. Proceed to the next step.
`UPDATE_FAILED`	The Compute node deletion failed. A common reason for a failed Compute node deletion is an unreachable IPMI interface on a node that you want to remove. When the deletion fails, you must complete the process manually. Proceed to Completing the removal of an unreachable Compute node to complete the Compute node removal.

UPDATE_COMPLETE

The Compute node deletion completed successfully. Proceed to the next step.

UPDATE_FAILED

The Compute node deletion failed.

A common reason for a failed Compute node deletion is an unreachable IPMI interface on a node that you want to remove.

When the deletion fails, you must complete the process manually. Proceed to Completing the removal of an unreachable Compute node to complete the Compute node removal.

If Instance HA is enabled, perform the following actions:

Clean up the Pacemaker resources for the Compute node:

$ sudo pcs resource delete <compute_UUID>
$ sudo cibadmin -o nodes --delete --xml-text '<node id="<compute_UUID>"/>'
$ sudo cibadmin -o fencing-topology --delete --xml-text '<fencing-level target="<compute_UUID>"/>'
$ sudo cibadmin -o status --delete --xml-text '<node_state id="<compute_UUID>"/>'
$ sudo cibadmin -o status --delete-all --xml-text '<node id="<compute_UUID>"/>' --force

Delete the STONITH device for the node:

$ sudo pcs stonith delete <compute_UUID>

If you are not replacing the removed Compute nodes on the overcloud, then decrease the ComputeCount parameter in the environment file that contains your node counts. This file is usually named node-info.yaml. For example, decrease the node count from four nodes to three nodes if you removed one node:
```
parameter_defaults:
  ...
  ComputeCount: 3
```
Decreasing the node count ensures that director does not provision any new nodes when you run openstack overcloud deploy.
If you are replacing the removed Compute node on your overcloud deployment, see Replacing a removed Compute node.

16.3.1. Completing the removal of an unreachable Compute node

If the openstack overcloud node delete command failed due to an unreachable node, then you must manually complete the removal of the Compute node from the overcloud.

Prerequisites

Performing the Removing or replacing a Compute node procedure returned a status of UPDATE_FAILED.

Procedure

Identify the UUID of the overcloud stack:
```
(undercloud)$ openstack stack list
```
Identify the UUID of the node that you want to manually delete:
```
(undercloud)$ openstack baremetal node list
```
Set the node that you want to delete to maintenance mode:
```
(undercloud)$ openstack baremetal node maintenance set <UUID>
```
- Replace <UUID> with the UUID of the node to put into maintenance mode.
Wait for the Compute service to synchronize its state with the Bare Metal service. This can take up to four minutes.
Source the overcloud configuration:
```
(undercloud)$ source ~/overcloudrc
```
Confirm that the Compute service is disabled on the deleted node on the overcloud, to prevent the node from scheduling new instances:
```
(overcloud)$ openstack compute service list
```
If the Compute service is not disabled then disable it:
```
(overcloud)$ openstack compute service set <hostname> nova-compute --disable
```
- Replace <hostname> with the hostname of the Compute node.
  Tip
  Use the --disable-reason option to add a short explanation on why the service is being disabled. This is useful if you intend to redeploy the Compute service.
Remove the Compute service from the deleted Compute node:
```
(overcloud)$ openstack compute service delete <service_id>
```
- Replace <service_id> with the ID of the Compute service that was running on the deleteed node.

Remove the deleted Compute service as a resource provider from the Placement service:

(overcloud)$ openstack resource provider list
(overcloud)$ openstack resource provider delete <UUID>

Source the undercloud configuration:
```
(overcloud)$ source ~/stackrc
```
Delete the Compute node from the stack:
```
(undercloud)$ openstack overcloud node delete --stack <overcloud> <node>
```
- Replace <overcloud> with the name or UUID of the overcloud stack.
- Replace <node> with the Compute service hostname or UUID of the Compute node that you want to delete.
  Note
  If the node has already been powered off, this command returns a WARNING message:
  Ansible failed, check log at `/var/lib/mistral/overcloud/ansible.log` WARNING: Scale-down configuration error. Manual cleanup of some actions may be necessary. Continuing with node removal.
  You can ignore this message.
Wait for the overcloud node to be deleted.
Source the overcloud configuration:
```
(undercloud)$ source ~/overcloudrc
```

Delete the network agents for the node that you deleted:

(overcloud)$ for AGENT in $(openstack network agent list \
  --host <scaled_down_node> -c ID -f value) ; \
  do openstack network agent delete $AGENT ; done

Replace <scaled_down_node> with the name of the node you deleted.

Check the command output. Because of a bug in RHOSP 16.1.7 and older, you might see a message indicating that the agents could not be deleted.
```
Bad agent request: OVN agents cannot be deleted.
```
If you do not see this message, proceed to the next step.
If you see this message, complete the procedure in Deleting the network agents: workaround for bug. After completing the workaround procedure, return here and proceed to the next step.
Source the undercloud configuration:
```
(overcloud)$ source ~/stackrc
```

Check the status of the overcloud stack when the node deletion is complete:

(undercloud)$ openstack stack list

Table 16.3. Result

Status Description

Status	Description
`UPDATE_COMPLETE`	The Compute node deletion completed successfully. Proceed to the next step.
`UPDATE_FAILED`	The Compute node deletion failed. If the Compute node deletion fails while the node is in maintenance mode, then the problem might be with the hardware. Check the hardware.

UPDATE_COMPLETE

The Compute node deletion completed successfully. Proceed to the next step.

UPDATE_FAILED

The Compute node deletion failed.

If the Compute node deletion fails while the node is in maintenance mode, then the problem might be with the hardware. Check the hardware.

If Instance HA is enabled, perform the following actions:

Clean up the Pacemaker resources for the node:

$ sudo pcs resource delete <scaled_down_node>
$ sudo cibadmin -o nodes --delete --xml-text '<node id="<scaled_down_node>"/>'
$ sudo cibadmin -o fencing-topology --delete --xml-text '<fencing-level target="<scaled_down_node>"/>'
$ sudo cibadmin -o status --delete --xml-text '<node_state id="<scaled_down_node>"/>'
$ sudo cibadmin -o status --delete-all --xml-text '<node id="<scaled_down_node>"/>' --force

Delete the STONITH device for the node:
```
$ sudo pcs stonith delete <device-name>
```

If you are not replacing the removed Compute node on the overcloud, then decrease the ComputeCount parameter in the environment file that contains your node counts. This file is usually named node-info.yaml. For example, decrease the node count from four nodes to three nodes if you removed one node:
```
parameter_defaults:
  ...
  ComputeCount: 3
  ...
```
Decreasing the node count ensures that director does not provision any new nodes when you run openstack overcloud deploy.
If you are replacing the removed Compute node on your overcloud deployment, see Replacing a removed Compute node.

16.3.2. Deleting the network agents: workaround for bug

After you remove a Compute node, you must delete the associated network agent. If your deployment uses RHOSP 16.1.7 or earlier, a bug prevents you from deleting network agents as expected. See BZ1788336-ovn-controllers are listed as agents but cannot be removed.

With this bug, when you attempt to delete the agent as instructed, the Networking service displays the following error message:

Bad agent request: OVN agents cannot be deleted.

If you see that error message, perform the following steps to delete the agent.

Prerequisites

Your attempt to delete network agents after removing a Compute node failed, as indicated by the following error message:
```
Bad agent request: OVN agents cannot be deleted.
```

Procedure

List the overcloud nodes:
```
(undercloud)$ openstack server list
```
Log into a Controller node as a user with root privileges.
```
$ ssh heat-admin@controller-0.ctlplane
```
If you have not done so already, set up command aliases to simplify access to the ovn-sbctl command on the ovn_controller container. For more information, see Creating aliases for OVN troubleshooting commands.
Obtain the IP address from the ovn-controller.log file:
```
$ sudo less /var/log/containers/openvswitch/ovn-controller.log
```
If ovn-controller.log is empty try ovn-controller.log.1.
Confirm that the IP address is correct:
```
$ ovn-sbctl list encap |grep -a3 <IP_address_from_ovn-controller.log>
```
Replace <IP_address_from_ovn-controller.log> with the IP address from the controller log file.
Delete the chassis that contains the IP address:
```
$ ovn-sbctl chassis-del <chassis-name>
```
Replace <chassis-id> with the chassis_name value from the output of the ovn-sbctl list encap command in the previous step.
Check the Chassis_Private table to confirm that chassis has been removed:
```
$ ovn-sbctl find Chassis_private chassis="[]"
```
If any chasis are listed, remove each with the following command:
```
$ ovn-sbctl destroy Chassis_Private <listed_name>
```
Replace <listed_name> with the name of the chassis to delete.
Return to the procedure to complete the removal of the Compute node.

16.3.3. Replacing a removed Compute node

To replace a removed Compute node on your overcloud deployment, you can register and inspect a new Compute node or re-add the removed Compute node. You must also configure your overcloud to provision the node.

Procedure

Optional: To reuse the index of the removed Compute node, configure the RemovalPoliciesMode and the RemovalPolicies parameters for the role to replace the denylist when a Compute node is removed:
```
parameter_defaults:
  <RoleName>RemovalPoliciesMode: update
  <RoleName>RemovalPolicies: [{'resource_list': []}]
```
Replace the removed Compute node:
- To add a new Compute node, register, inspect, and tag the new node to prepare it for provisioning. For more information, see Configuring a basic overcloud.
- To re-add a Compute node that you removed manually, remove the node from maintenance mode:
```
$ openstack baremetal node maintenance unset <node_uuid>
```
Rerun the openstack overcloud deploy command that you used to deploy the existing overcloud.
Wait until the deployment process completes.
Confirm that director has successfully registered the new Compute node:
```
$ openstack baremetal node list
```
If you performed step 1 to set the RemovalPoliciesMode for the role to update, then you must reset the RemovalPoliciesMode for the role to the default value, append, to add the Compute node index to the current denylist when a Compute node is removed:
```
parameter_defaults:
  <RoleName>RemovalPoliciesMode: append
```
Rerun the openstack overcloud deploy command that you used to deploy the existing overcloud.

16.4. Preserving hostnames when replacing nodes that use predictable IP addresses and HostNameMap

If you configured your overcloud to use predictable IP addresses, and HostNameMap to map heat-based hostnames to the hostnames of pre-provisioned nodes, then you must configure your overcloud to map the new replacement node index to an IP address and hostname.

Procedure

Log in to the undercloud as the stack user.
Source the stackrc file:
```
$ source ~/stackrc
```

Retrieve the physical_resource_id and the removed_rsrc_list for the resource you want to replace:

$ openstack stack resource show <stack> <role>

Replace <stack> with the name of the stack the resource belongs to, for example, overcloud.

Replace <role> with the name of the role that you want to replace the node for, for example, Compute.

Example output:

+------------------------+-----------------------------------------------------------+
| Field                  | Value                                                     |
+------------------------+-----------------------------------------------------------+
| attributes             | {u'attributes': None, u'refs': None, u'refs_map': None,   |
|                        | u'removed_rsrc_list': [u'2', u'3']}          | 1
| creation_time          | 2017-09-05T09:10:42Z                                      |
| description            |                                                           |
| links                  | [{u'href': u'http://192.168.24.1:8004/v1/bd9e6da805594de9 |
|                        | 8d4a1d3a3ee874dd/stacks/overcloud/1c7810c4-8a1e-          |
|                        | 4d61-a5d8-9f964915d503/resources/Compute', u'rel':        |
|                        | u'self'}, {u'href': u'http://192.168.24.1:8004/v1/bd9e6da |
|                        | 805594de98d4a1d3a3ee874dd/stacks/overcloud/1c7810c4-8a1e- |
|                        | 4d61-a5d8-9f964915d503', u'rel': u'stack'}, {u'href': u'h |
|                        | ttp://192.168.24.1:8004/v1/bd9e6da805594de98d4a1d3a3ee874 |
|                        | dd/stacks/overcloud-Compute-zkjccox63svg/7632fb0b-        |
|                        | 80b1-42b3-9ea7-6114c89adc29', u'rel': u'nested'}]         |
| logical_resource_id    | Compute                                                   |
| physical_resource_id   | 7632fb0b-80b1-42b3-9ea7-6114c89adc29                      |
| required_by            | [u'AllNodesDeploySteps',                                  |
|                        | u'ComputeAllNodesValidationDeployment',                   |
|                        | u'AllNodesExtraConfig', u'ComputeIpListMap',              |
|                        | u'ComputeHostsDeployment', u'UpdateWorkflow',             |
|                        | u'ComputeSshKnownHostsDeployment', u'hostsConfig',        |
|                        | u'SshKnownHostsConfig', u'ComputeAllNodesDeployment']     |
| resource_name          | Compute                                                   |
| resource_status        | CREATE_COMPLETE                                           |
| resource_status_reason | state changed                                             |
| resource_type          | OS::Heat::ResourceGroup                                   |
| updated_time           | 2017-09-05T09:10:42Z                                      |
+------------------------+-----------------------------------------------------------+

1: The removed_rsrc_list lists the indexes of nodes that have already been removed for the resource.

Retrieve the resource_name to determine the maximum index that heat has applied to a node for this resource:
```
$ openstack stack resource list <physical_resource_id>
```
- Replace <physical_resource_id> with the ID you retrieved in step 3.
Use the resource_name and the removed_rsrc_list to determine the next index that heat will apply to a new node:
- If removed_rsrc_list is empty, then the next index will be (current_maximum_index) + 1.
- If removed_rsrc_list includes the value (current_maximum_index) + 1, then the next index will be the next available index.
Retrieve the ID of the replacement bare-metal node:
```
$ openstack baremetal node list
```
Update the capability of the replacement node with the new index:
```
$ openstack baremetal node set --property capabilities='node:<role>-<index>,boot_option:local' <node>
```
- Replace <role> with the name of the role that you want to replace the node for, for example, compute.
- Replace <index> with the index calculated in step 5.
- Replace <node> with the ID of the bare metal node.
The Compute scheduler uses the node capability to match the node on deployment.

Assign a hostname to the new node by adding the index to the HostnameMap configuration, for example:

parameter_defaults:
  ControllerSchedulerHints:
    'capabilities:node': 'controller-%index%'
  ComputeSchedulerHints:
    'capabilities:node': 'compute-%index%'
  HostnameMap:
    overcloud-controller-0: overcloud-controller-prod-123-0
    overcloud-controller-1: overcloud-controller-prod-456-0 1
    overcloud-controller-2: overcloud-controller-prod-789-0
    overcloud-controller-3: overcloud-controller-prod-456-0 2
    overcloud-compute-0: overcloud-compute-prod-abc-0
    overcloud-compute-3: overcloud-compute-prod-abc-3 3
    overcloud-compute-8: overcloud-compute-prod-abc-3 4
    ....

1: Node that you are removing and replacing with the new node.
2: New node.
3: Node that you are removing and replacing with the new node.
4: New node.

Note

Do not delete the mapping for the removed node from HostnameMap.

Add the IP address for the replacement node to the end of each network IP address list in your network IP address mapping file, ips-from-pool-all.yaml. In the following example, the IP address for the new index, overcloud-controller-3, is added to the end of the IP address list for each ControllerIPs network, and is assigned the same IP address as overcloud-controller-1 because it replaces overcloud-controller-1. The IP address for the new index, overcloud-compute-8, is also added to the end of the IP address list for each ComputeIPs network, and is assigned the same IP address as the index it replaces, overcloud-compute-3:
```
parameter_defaults:
  ControllerIPs:
    ...
    internal_api:
      - 192.168.1.10  1
      - 192.168.1.11  2
      - 192.168.1.12  3
      - 192.168.1.11  4
    ...
    storage:
      - 192.168.2.10
      - 192.168.2.11
      - 192.168.2.12
      - 192.168.2.11
    ...

  ComputeIPs:
    ...
    internal_api:
      - 172.17.0.10 5
      - 172.17.0.11 6
      - 172.17.0.11 7
    ...
    storage:
      - 172.17.0.10
      - 172.17.0.11
      - 172.17.0.11
    ...
```
1
IP address assigned to index 0, host name overcloud-controller-prod-123-0.
2
IP address assigned to index 1, host name overcloud-controller-prod-456-0. This node is replaced by index 3. Do not remove this entry.
3
IP address assigned to index 2, host name overcloud-controller-prod-789-0.
4
IP address assigned to index 3, host name overcloud-controller-prod-456-0. This is the new node that replaces index 1.
5
IP address assigned to index 0, host name overcloud-compute-0.
6
IP address assigned to index 1, host name overcloud-compute-3. This node is replaced by index 2. Do not remove this entry.
7
IP address assigned to index 2, host name overcloud-compute-8. This is the new node that replaces index 1.

16.5. Replacing Ceph Storage nodes

You can use director to replace Ceph Storage nodes in a director-created cluster. For more information, see the Deploying an Overcloud with Containerized Red Hat Ceph guide.

16.6. Replacing Object Storage nodes

Follow the instructions in this section to understand how to replace Object Storage nodes without impact to the integrity of the cluster. This example involves a three-node Object Storage cluster in which you want to replace the node overcloud-objectstorage-1 node. The goal of the procedure is to add one more node and then remove the overcloud-objectstorage-1 node. The new node replaces the overcloud-objectstorage-1 node.

Procedure

Increase the Object Storage count using the ObjectStorageCount parameter. This parameter is usually located in node-info.yaml, which is the environment file that contains your node counts:
```
parameter_defaults:
  ObjectStorageCount: 4
```
The ObjectStorageCount parameter defines the quantity of Object Storage nodes in your environment. In this example, scale the quantity of Object Storage nodes from 3 to 4.
Run the deployment command with the updated ObjectStorageCount parameter:
```
$ source ~/stackrc
$ openstack overcloud deploy --templates -e node-info.yaml <environment_files>
```
After the deployment command completes, the overcloud contains an additional Object Storage node.
Replicate data to the new node. Before you remove a node, in this case, overcloud-objectstorage-1, wait for a replication pass to finish on the new node. Check the replication pass progress in the /var/log/swift/swift.log file. When the pass finishes, the Object Storage service should log entries similar to the following example:
```
Mar 29 08:49:05 localhost *object-server: Object replication complete.*
Mar 29 08:49:11 localhost *container-server: Replication run OVER*
Mar 29 08:49:13 localhost *account-server: Replication run OVER*
```
To remove the old node from the ring, reduce the ObjectStorageCount parameter to omit the old node. In this example, reduce the ObjectStorageCount parameter to 3:
```
parameter_defaults:
  ObjectStorageCount: 3
```
Create a new environment file named remove-object-node.yaml. This file identifies and removes the specified Object Storage node. The following content specifies the removal of overcloud-objectstorage-1:
```
parameter_defaults:
  ObjectStorageRemovalPolicies:
    [{'resource_list': ['1']}]
```

Include both the node-info.yaml and remove-object-node.yaml files in the deployment command:

$ openstack overcloud deploy --templates -e node-info.yaml <environment_files> -e remove-object-node.yaml

Director deletes the Object Storage node from the overcloud and updates the rest of the nodes on the overcloud to accommodate the node removal.

Important

Include all environment files and options from your initial overcloud creation. This includes the same scale parameters for non-Compute nodes.

16.7. Using skip deploy identifier

During a stack update operation puppet, by default, reapplies all manifests. This can result in a time consuming operation, which may not be required.

To override the default operation, use the skip-deploy-identifier option.

openstack overcloud deploy --skip-deploy-identifier

Use this option if you do not want the deployment command to generate a unique identifier for the DeployIdentifier parameter. The software configuration deployment steps only trigger if there is an actual change to the configuration. Use this option with caution and only if you are confident that you do not need to run the software configuration, such as scaling out certain roles.

Note

If there is a change to the puppet manifest or hierdata, puppet will reapply all manifests even when --skip-deploy-identifier is specified.

16.8. Blocklisting nodes

You can exclude overcloud nodes from receiving an updated deployment. This is useful in scenarios where you want to scale new nodes and exclude existing nodes from receiving an updated set of parameters and resources from the core heat template collection. This means that the blocklisted nodes are isolated from the effects of the stack operation.

Use the DeploymentServerBlacklist parameter in an environment file to create a blocklist.

Setting the blocklist

The DeploymentServerBlacklist parameter is a list of server names. Write a new environment file, or add the parameter value to an existing custom environment file and pass the file to the deployment command:

parameter_defaults:
  DeploymentServerBlacklist:
    - overcloud-compute-0
    - overcloud-compute-1
    - overcloud-compute-2

Note

The server names in the parameter value are the names according to OpenStack Orchestration (heat), not the actual server hostnames.

Include this environment file with your openstack overcloud deploy command:

$ source ~/stackrc
$ openstack overcloud deploy --templates \
  -e server-blocklist.yaml \
  -e [...]

Heat blocklists any servers in the list from receiving updated heat deployments. After the stack operation completes, any blocklisted servers remain unchanged. You can also power off or stop the os-collect-config agents during the operation.

Warning

Exercise caution when you blocklist nodes. Only use a blocklist if you fully understand how to apply the requested change with a blocklist in effect. It is possible to create a hung stack or configure the overcloud incorrectly when you use the blocklist feature. For example, if cluster configuration changes apply to all members of a Pacemaker cluster, blocklisting a Pacemaker cluster member during this change can cause the cluster to fail.
Do not use the blocklist during update or upgrade procedures. Those procedures have their own methods for isolating changes to particular servers.
When you add servers to the blocklist, further changes to those nodes are not supported until you remove the server from the blocklist. This includes updates, upgrades, scale up, scale down, and node replacement. For example, when you blocklist existing Compute nodes while scaling out the overcloud with new Compute nodes, the blocklisted nodes miss the information added to /etc/hosts and /etc/ssh/ssh_known_hosts. This can cause live migration to fail, depending on the destination host. The Compute nodes are updated with the information added to /etc/hosts and /etc/ssh/ssh_known_hosts during the next overcloud deployment where they are no longer blocklisted. Do not modify the /etc/hosts and /etc/ssh/ssh_known_hosts files manually. To modify the /etc/hosts and /etc/ssh/ssh_known_hosts files, run the overcloud deploy command as described in the Clearing the Blocklist section.

Clearing the blocklist

To clear the blocklist for subsequent stack operations, edit the DeploymentServerBlacklist to use an empty array:

parameter_defaults:
  DeploymentServerBlacklist: []

Warning

Do not omit the DeploymentServerBlacklist parameter. If you omit the parameter, the overcloud deployment uses the previously saved value.

Select Your Language

Chapter 16. Scaling overcloud nodes

16.1. Adding nodes to the overcloud

16.2. Increasing node counts for roles

16.3. Removing or replacing Compute nodes

16.3.1. Completing the removal of an unreachable Compute node

16.3.2. Deleting the network agents: workaround for bug

16.3.3. Replacing a removed Compute node

16.4. Preserving hostnames when replacing nodes that use predictable IP addresses and HostNameMap

16.5. Replacing Ceph Storage nodes

16.6. Replacing Object Storage nodes

16.7. Using skip deploy identifier

16.8. Blocklisting nodes

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Language and Page Formatting Options

Chapter 16. Scaling overcloud nodes

16.1. Adding nodes to the overcloud

16.2. Increasing node counts for roles

16.3. Removing or replacing Compute nodes

16.3.1. Completing the removal of an unreachable Compute node

16.3.2. Deleting the network agents: workaround for bug

16.3.3. Replacing a removed Compute node

16.4. Preserving hostnames when replacing nodes that use predictable IP addresses and HostNameMap

16.5. Replacing Ceph Storage nodes

16.6. Replacing Object Storage nodes

16.7. Using skip deploy identifier

16.8. Blocklisting nodes

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links