Chapter 10. Scaling Compute nodes with director Operator
If you require more or fewer compute resources for your overcloud, you can scale the number of Compute nodes according to your requirements.
10.1. Adding Compute nodes to your overcloud with the director Operator
To add more Compute nodes to your overcloud, you must increase the node count for the compute
OpenStackBaremetalSet resource. When a new node is provisioned, a new OpenStackConfigGenerator resource is created to generate a new set of Ansible playbooks. Use the OpenStackConfig Version to create or update the OpenStackDeploy object to reapply the Ansible configuration to your overcloud
Prerequisites
- Ensure your OpenShift Container Platform cluster is operational and you have installed the director Operator correctly.
- Deploy and configure an overcloud that runs in your OCP cluster.
-
Ensure that you have installed the
oc
command line tool on your workstation. -
Check that you have enough hosts in a ready state in the
openshift-machine-api
namespace. Run theoc get baremetalhosts -n openshift-machine-api
command to check the hosts available. For more information on managing your bare metal hosts, see "Managing bare metal hosts"
Procedure
Modify the YAML configuration for the
compute
OpenStackBaremetalSet and increasecount
parameter for the resource:$ oc patch osbms compute --type=merge --patch '{"spec":{"count":3}}' -n openstack
The OpenStackBaremetalSet resource automatically provisions new nodes with the Red Hat Enterprise Linux base operating system. Wait until the provisioning process completes. Check the nodes periodically to determine the readiness of the nodes:
$ oc get baremetalhosts -n openshift-machine-api $ oc get openstackbaremetalset
- Generate the Ansible Playbooks using OpenStackConfigGenerator, see Configuring overcloud software with the director Operator.
Additional resources
10.2. Removing Compute nodes from your overcloud with the director Operator
To remove a Compute node from your overcloud, you must disable the Compute node, mark it for deletion, and decrease the node count for the compute
OpenStackBaremetalSet
resource.
If you scale the overcloud with a new node in the same role, the node reuses the host names starting with the lowest ID suffix and corresponding IP reservation.
Prerequisites
- The workloads on the Compute nodes have been migrated to other Compute nodes. For more information, see Migrating virtual machine instances between Compute nodes.
Procedure
Access the remote shell for
openstackclient
:$ oc rsh -n openstack openstackclient
Identify the Compute node that you want to remove:
$ openstack compute service list
Disable the Compute service on the node to prevent the node from scheduling new instances:
$ openstack compute service set <hostname> nova-compute --disable
Annotate the bare-metal node to prevent Metal3 from starting the node:
$ oc annotate baremetalhost <node> baremetalhost.metal3.io/detached=true $ oc logs --since=1h <metal3-pod> metal3-baremetal-operator | grep -i detach $ oc get baremetalhost <node> -o json | jq .status.operationalStatus "detached"
-
Replace
<node>
with the name of theBareMetalHost
resource. -
Replace
<metal3-pod>
with the name of yourmetal3
pod.
-
Replace
Log in to the Compute node as the
root
user and shut down the bare-metal node:[root@compute-0 ~]# shutdown -h now
If the Compute node is not accessible, complete the following steps:
-
Log in to a Controller node as the
root
user. If Instance HA is enabled, disable the STONITH device for the Compute node:
[root@controller-0 ~]# pcs stonith disable <stonith_resource_name>
-
Replace
<stonith_resource_name>
with the name of the STONITH resource that corresponds to the node. The resource name uses the the format<resource_agent>-<host_mac>
. You can find the resource agent and the host MAC address in theFencingConfig
section of thefencing.yaml
file.
-
Replace
- Use IPMI to power off the bare-metal node. For more information, see your hardware vendor documentation.
-
Log in to a Controller node as the
Retrieve the
BareMetalHost
resource that corresponds to the node that you want to remove:$ oc get openstackbaremetalset compute -o json | jq '.status.baremetalHosts | to_entries[] | "\(.key) => \(.value | .hostRef)"' "compute-0, openshift-worker-3" "compute-1, openshift-worker-4"
To change the status of the
annotatedForDeletion
parameter totrue
in theOpenStackBaremetalSet
resource, annotate theBareMetalHost
resource withosp-director.openstack.org/delete-host=true
:$ oc annotate -n openshift-machine-api bmh/openshift-worker-3 osp-director.openstack.org/delete-host=true --overwrite
Optional: Confirm that the
annotatedForDeletion
status has changed totrue
in theOpenStackBaremetalSet
resource:$ oc get openstackbaremetalset compute -o json -n openstack | jq .status { "baremetalHosts": { "compute-0": { "annotatedForDeletion": true, "ctlplaneIP": "192.168.25.105/24", "hostRef": "openshift-worker-3", "hostname": "compute-0", "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-3", "provisioningState": "provisioned", "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-3" }, "compute-1": { "annotatedForDeletion": false, "ctlplaneIP": "192.168.25.106/24", "hostRef": "openshift-worker-4", "hostname": "compute-1", "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-4", "provisioningState": "provisioned", "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-4" } }, "provisioningStatus": { "readyCount": 2, "reason": "All requested BaremetalHosts have been provisioned", "state": "provisioned" } }
Decrease the
count
parameter for thecompute
OpenStackBaremetalSet
resource:$ oc patch openstackbaremetalset compute --type=merge --patch '{"spec":{"count":1}}' -n openstack
When you reduce the resource count of the
OpenStackBaremetalSet
resource, you trigger the corresponding controller to handle the resource deletion, which causes the following actions:-
Director Operator deletes the corresponding IP reservations from
OpenStackIPSet
andOpenStackNetConfig
for the node. Director Operator flags the IP reservation entry in the
OpenStackNet
resource as deleted:$ oc get osnet ctlplane -o json -n openstack | jq .status.reservations { "compute-0": { "deleted": true, "ip": "172.22.0.140" }, "compute-1": { "deleted": false, "ip": "172.22.0.100" }, "controller-0": { "deleted": false, "ip": "172.22.0.120" }, "controlplane": { "deleted": false, "ip": "172.22.0.110" }, "openstackclient-0": { "deleted": false, "ip": "172.22.0.251" }
-
Director Operator deletes the corresponding IP reservations from
-
Optional: To make the IP reservations of the deleted
OpenStackBaremetalSet
resource available for other roles to use, set the value of thespec.preserveReservations
parameter to false in theOpenStackNetConfig
object. Access the remote shell for
openstackclient
:$ oc rsh openstackclient -n openstack
Remove the Compute service entries from the overcloud:
$ openstack compute service list $ openstack compute service delete <service-id>
Check the Compute network agents entries in the overcloud and remove them if they exist:
$ openstack network agent list $ for AGENT in $(openstack network agent list --host <scaled-down-node> -c ID -f value) ; do openstack network agent delete $AGENT ; done
Exit from
openstackclient
:$ exit