Chapter 10. Scaling Compute nodes with director Operator

In you require more or fewer compute resources for your overcloud, you can scale the number of Compute nodes according to your requirements.

10.1. Adding Compute nodes to your overcloud with the director Operator

To add more Compute nodes to your overcloud, you must increase the node count for the compute OpenStackBaremetalSet resource. When you increase the node count, the OpenStackPlaybookGenerator resource regenerates a new set of Ansible playbooks. Run the tripleo-deploy.sh script to reapply the new Ansible configuration to your overcloud

Prerequisites

  • Ensure your OpenShift Container Platform cluster is operational and you have installed the director Operator correctly.
  • Deploy and configure an overcloud that runs in your OCP cluster.
  • Ensure that you have installed the oc command line tool on your workstation.
  • Check that you have enough hosts in a ready state in the openshift-machine-api namespace. Run the oc get baremetalhosts -n openshift-machine-api command to check the hosts available. For more information on managing your bare metal hosts, see "Managing bare metal hosts"

Procedure

  1. Modify the YAML configuration for the compute OpenStackBaremetalSet and increase count parameter for the resource:

    $ oc patch osbms compute --type=merge --patch '{"spec":{"count":3}}' -n openstack
  2. The OpenStackBaremetalSet resource automatically provisions new nodes with the Red Hat Enterprise Linux base operating system. Wait until the provisioning process completes. Check the nodes periodically to determine the readiness of the nodes:

    $ oc get baremetalhosts -n openshift-machine-api
  3. The OpenStackPlaybookGenerator resource automatically generates new Ansible playbooks for configuration. Wait until the regeneration process completes. Check the OpenStackPlaybookGenerator resource periodically to determine the readiness of the playbooks:

    $ oc describe openstackplaybookgenerator/default -n openstack
  4. Access the remote shell for openstackclient:

    $ oc rsh -n openstack openstackclient
  5. Change to the cloud-admin home directory:

    $ cd /home/cloud-admin
  6. Optional: Check the diff for the overcloud Ansible playbooks:

    $ ./tripleo-deploy.sh -d
  7. Accept the newest version of the rendered Ansible playbooks and tag them as latest:

    $ ./tripleo-deploy.sh -a
  8. Apply the Ansible playbooks against the overcloud nodes:

    $ ./tripleo-deploy.sh -p

Additional resources

10.2. Removing Compute nodes from your overcloud with the director Operator

To remove a Compute node from your overcloud, you must disable the Compute node, mark it for deletion, and decrease the node count for the compute OpenStackBaremetalSet resource.

Prerequisites

  • Ensure your OpenShift Container Platform cluster is operational and you have installed the director Operator correctly.
  • Deploy and configure an overcloud that runs in your OCP cluster.
  • Ensure that you have installed the oc command line tool on your workstation.

Procedure

  1. Access the remote shell for openstackclient:

    $ oc rsh -n openstack openstackclient
  2. Identify the Compute node that you want to remove and disable the Compute service on the node to prevent the node from scheduling new instances:

    $ openstack compute service list
    $ openstack compute service set <hostname> nova-compute --disable
  3. Exit from openstackclient:

    $ exit
  4. Annotate the BareMetalHost resource that corresponds to the node that you want to remove with the osp-director.openstack.org/delete-host=true annotation:

    $ oc annotate -n openshift-machine-api bmh/openshift-worker-3 osp-director.openstack.org/delete-host=true --overwrite

    The annotatedForDeletion status changes in the OpenStackBaremetalSet resource:

    $ oc get osbms compute -o json -n openstack | jq .status
    {
      "baremetalHosts": {
        "compute-0": {
          "annotatedForDeletion": true,
          "ctlplaneIP": "192.168.25.105/24",
          "hostRef": "openshift-worker-3",
          "hostname": "compute-0",
          "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-3",
          "provisioningState": "provisioned",
          "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-3"
        },
        "compute-1": {
          "annotatedForDeletion": false,
          "ctlplaneIP": "192.168.25.106/24",
          "hostRef": "openshift-worker-4",
          "hostname": "compute-1",
          "networkDataSecretName": "compute-cloudinit-networkdata-openshift-worker-4",
          "provisioningState": "provisioned",
          "userDataSecretName": "compute-cloudinit-userdata-openshift-worker-4"
        }
      },
      "provisioningStatus": {
        "readyCount": 2,
        "reason": "All requested BaremetalHosts have been provisioned",
        "state": "provisioned"
      }
    }
  5. Modify the YAML configuration for the compute OpenStackBaremetalSet resource and decrease count parameter for the resource:

    oc patch osbms compute --type=merge --patch '{"spec":{"count":1}}' -n openstack

    When you reduce the resource count of the OpenStackBaremetalSet resource, you trigger the corresponding controller to handle the resource deletion, which causes the following actions:

    • The director Operator deletes the corresponding OpenStackIPSet for the node
    • The director Operator flags the IP reservation entry in the OpenStackNet resource as deleted

      oc get osnet ctlplane -o json -n openstack | jq .status.roleReservations.compute
      {
        "addToPredictableIPs": true,
        "reservations": [
          {
            "deleted": true,
            "hostname": "compute-0",
            "ip": "192.168.25.105",
            "vip": false
          },
          {
            "deleted": false,
            "hostname": "compute-1",
            "ip": "192.168.25.106",
            "vip": false
          }
        ]
      }

    The following consequences occur as a result of the node deletion and IP reservation changes:

    • The IP is not free for another role to use.
    • If you scale the overcloud with a new node in the same role, the node reuses the hostnames starting with lowest ID suffix and corresponding IP reservation.
    • If you delete the OpenStackBaremetalSet resource, you will delete all IP reservations for the corresponding role, which means other roles can use these IP addresses.
  6. Access the remote shell for openstackclient:

    $ oc rsh openstackclient -n openstack
  7. Remove the Compute service entries from the overcloud:

    $ openstack compute service list
    $ openstack compute service delete <service-id>
  8. Check the Compute network agents entries in the overcloud and remove them if they exist:

    $ openstack network agent list
    $ for AGENT in $(openstack network agent list --host <scaled-down-node> -c ID -f value) ; do openstack network agent delete $AGENT ; done
  9. Exit from openstackclient:

    $ exit