Chapter 10. Scaling Compute nodes with director Operator

If you require more or fewer compute resources for your overcloud, you can scale the number of Compute nodes according to your requirements.

10.1. Adding Compute nodes to your overcloud with the director Operator

To add more Compute nodes to your overcloud, you must increase the node count for the compute OpenStackBaremetalSet resource. When a new node is provisioned, a new OpenStackConfigGenerator resource is created to generate a new set of Ansible playbooks. Use the OpenStackConfig Version to create or update the OpenStackDeploy object to reapply the Ansible configuration to your overcloud

Prerequisites

  • Ensure your OpenShift Container Platform cluster is operational and you have installed the director Operator correctly.
  • Deploy and configure an overcloud that runs in your OCP cluster.
  • Ensure that you have installed the oc command line tool on your workstation.
  • Check that you have enough hosts in a ready state in the openshift-machine-api namespace. Run the oc get baremetalhosts -n openshift-machine-api command to check the hosts available. For more information on managing your bare metal hosts, see "Managing bare metal hosts"

Procedure

  1. Modify the YAML configuration for the compute OpenStackBaremetalSet and increase count parameter for the resource:

    $ oc patch osbms compute --type=merge --patch '{"spec":{"count":3}}' -n openstack
  2. The OpenStackBaremetalSet resource automatically provisions new nodes with the Red Hat Enterprise Linux base operating system. Wait until the provisioning process completes. Check the nodes periodically to determine the readiness of the nodes:

    $ oc get baremetalhosts -n openshift-machine-api
    $ oc get openstackbaremetalset
  3. Generate the Ansible Playbooks using OpenStackConfigGenerator, see Configuring overcloud software with the director Operator.

Additional resources

10.2. Removing Compute nodes from your overcloud with the director Operator

To remove a Compute node from your overcloud, you must disable the Compute node, mark it for deletion, and decrease the node count for the compute OpenStackBaremetalSet resource.

Note

If you scale the overcloud with a new node in the same role, the node reuses the host names starting with the lowest ID suffix and corresponding IP reservation.

Prerequisites

Procedure

  1. Access the remote shell for openstackclient:

    $ oc rsh -n openstack openstackclient
  2. Identify the Compute node that you want to remove:

    $ openstack compute service list
  3. Disable the Compute service on the node to prevent the node from scheduling new instances:

    $ openstack compute service set <hostname> nova-compute --disable
  4. Annotate the bare-metal node to prevent Metal3 from starting the node:

    $ oc annotate baremetalhost <node> baremetalhost.metal3.io/detached=true
    $ oc logs --since=1h <metal3-pod> metal3-baremetal-operator | grep -i detach
    $ oc get baremetalhost <node> -o json | jq .status.operationalStatus
    "detached"
    • Replace <node> with the name of the BareMetalHost resource.
    • Replace <metal3-pod> with the name of your metal3 pod.
  5. Log in to the Compute node as the root user and shut down the bare-metal node:

    [root@compute-0 ~]# shutdown -h now

    If the Compute node is not accessible, complete the following steps:

    1. Log in to a Controller node as the root user.
    2. If Instance HA is enabled, disable the STONITH device for the Compute node:

      [root@controller-0 ~]# pcs stonith disable <stonith_resource_name>
      • Replace <stonith_resource_name> with the name of the STONITH resource that corresponds to the node. The resource name uses the the format <resource_agent>-<host_mac>. You can find the resource agent and the host MAC address in the FencingConfig section of the fencing.yaml file.
    3. Use IPMI to power off the bare-metal node. For more information, see your hardware vendor documentation.
  6. Retrieve the BareMetalHost resource that corresponds to the node that you want to remove:

    $ oc get openstackbaremetalset compute -o json | jq '.status.baremetalHosts | to_entries[] | "\(.key) => \(.value | .hostRef)"'
    "compute-0, openshift-worker-3"
    "compute-1, openshift-worker-4"
  7. To change the status of the annotatedForDeletion parameter to true in the OpenStackBaremetalSet resource, annotate the BareMetalHost resource with osp-director.openstack.org/delete-host=true:

    $ oc annotate -n openshift-machine-api bmh/openshift-worker-3 osp-director.openstack.org/delete-host=true --overwrite
  8. Optional: Confirm that the annotatedForDeletion status has changed to true in the OpenStackBaremetalSet resource:

    $ oc get openstackbaremetalset compute -o json -n openstack | jq .status
    {
      "baremetalHosts": {
        "compute-0": {
          "annotatedForDeletion": true,
          "ctlplaneIP": "192.168.25.105/24",
          "hostRef": "openshift-worker-3",
          "hostname": "compute-0",
          "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-3",
          "provisioningState": "provisioned",
          "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-3"
        },
        "compute-1": {
          "annotatedForDeletion": false,
          "ctlplaneIP": "192.168.25.106/24",
          "hostRef": "openshift-worker-4",
          "hostname": "compute-1",
          "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-4",
          "provisioningState": "provisioned",
          "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-4"
        }
      },
      "provisioningStatus": {
        "readyCount": 2,
        "reason": "All requested BaremetalHosts have been provisioned",
        "state": "provisioned"
      }
    }
  9. Decrease the count parameter for the compute OpenStackBaremetalSet resource:

    $ oc patch openstackbaremetalset compute --type=merge --patch '{"spec":{"count":1}}' -n openstack

    When you reduce the resource count of the OpenStackBaremetalSet resource, you trigger the corresponding controller to handle the resource deletion, which causes the following actions:

    • Director Operator deletes the corresponding IP reservations from OpenStackIPSet and OpenStackNetConfig for the node.
    • Director Operator flags the IP reservation entry in the OpenStackNet resource as deleted:

      $ oc get osnet ctlplane -o json -n openstack | jq .status.reservations
      {
        "compute-0": {
          "deleted": true,
          "ip": "172.22.0.140"
        },
        "compute-1": {
          "deleted": false,
          "ip": "172.22.0.100"
        },
        "controller-0": {
          "deleted": false,
          "ip": "172.22.0.120"
        },
        "controlplane": {
          "deleted": false,
          "ip": "172.22.0.110"
        },
        "openstackclient-0": {
          "deleted": false,
          "ip": "172.22.0.251"
        }
  10. Optional: To make the IP reservations of the deleted OpenStackBaremetalSet resource available for other roles to use, set the value of the spec.preserveReservations parameter to false in the OpenStackNetConfig object.
  11. Access the remote shell for openstackclient:

    $ oc rsh openstackclient -n openstack
  12. Remove the Compute service entries from the overcloud:

    $ openstack compute service list
    $ openstack compute service delete <service-id>
  13. Check the Compute network agents entries in the overcloud and remove them if they exist:

    $ openstack network agent list
    $ for AGENT in $(openstack network agent list --host <scaled-down-node> -c ID -f value) ; do openstack network agent delete $AGENT ; done
  14. Exit from openstackclient:

    $ exit