OpenShift on OpenStack with Availability Zones: Invalid Compute ServerGroup setup during OpenShift deployment

Solution Verified - Updated -


  • OpenShift on OpenStack IPI
  • Version of the initial cluster deployment is inferior to 4.14
  • Masters deployed with explicitly given Availability Zones (via zones) in install-config.yaml


Two of the three masters have an invalid ServerGroup in their Machine ProviderSpec.


In the bug resolution, the Installer v4.14+ is correctly setting the same server group for all masters, regardless of the number of availability zones.

Note that in this environment (multiple Control plane AZs), the proposed solution is strictly incompatible with the “affinity” policy.

For clusters that were deployed before the 4.14 with the IPI method and with masters deployed on multiple availability zones, you need to manually update the Machine resources so that they reflect the actual state of the instances.

Note: editing the Machine resources will not trigger a rollout of the Control plane instances, because in-place edits of the Machine resources are not acted upon by any OpenShift operator. However, this in-place edit is necessary in order for the cluster-control-plane-machine-set-operator to correctly generate a ControlPlaneMachineSet for your cluster.

To do that, edit the ProviderSpec of both master-1 and master-2, and set the property serverGroupName of spec.providerSpec to the value of master-0’s spec.providerSpec.serverGroupName:

oc edit machine/<cluster_id>-master-1 -n openshift-machine-api
<make edits>
oc edit machine/<cluster_id>-master-2 -n openshift-machine-api
<make edits>

Here is an example of a providerSpec:

    availabilityZone: az0
      cloudName: openstack
      name: openstack-cloud-credentials
      namespace: openshift-machine-api
    flavor: m1.xlarge
    image: rhcos-4.14
    kind: OpenstackProviderSpec
      creationTimestamp: null
    - filter: {}
      - filter:
          name: refarch-lv7q9-nodes
          tags: openshiftClusterID=refarch-lv7q9
    - filter: {}
      name: refarch-lv7q9-master
    serverGroupName: refarch-lv7q9-master-az0 <---- CHANGE ME
      Name: refarch-lv7q9-master
      openshiftClusterID: refarch-lv7q9
    - openshiftClusterID=refarch-lv7q9
    trunk: true
      name: master-user-data

In case you edited or recreated your Control Plane Machine resources after install, you will have to adapt these steps to your situation. In your OpenStack cluster, find the server group your Control plane instances are in and set it in the ServerGroupName property of all three Control Plane Machines.

Once all the three Control plane Machine resources have the same correct ServerGroupName, your control plane is ready to be managed by the Cluster Control Plane Machine Set Operator and a ControlPlaneMachineSet (CPMS) will be created.

It'll be up to the user to review the generated CPMS and edit its state to Active when ready:

oc describe --namespace openshift-machine-api
oc edit --namespace openshift-machine-api

Root Cause

If the masters are configured with Availability Zones (AZ), the installer (via Terraform) will create one ServerGroup in OpenStack (the one initially created for master-0, ending with the name of the AZ) but configure the Machine ProviderSpec with different ServerGroups, one per AZ.

For example: given an install-config.yaml with three zones in the ControlPlane machine-pool, the Installer creates all three Nova instances in the same server group, and at the same time generates each Machine resource has a different value in ServerGroupName. The name of the actual server group in OpenStack is the one in the master-0 Machine resource; master-1 and master-2 each have bogus values for the ServerGroupName property.

This anomaly was reported as OCPBUGS-13300.

Diagnostic Steps

To check whether the masters have different ServerGroupNames, you can run:

oc get -n openshift-machine-api machine -o json | jq -r '.items[] | select(.metadata.labels[""] == "master") | [, .spec.providerSpec.value.serverGroupName] | join(": ")'

If you don’t obtain a unique name (which happens if your cluster is older than 4.14 and you’re upgrading to OCP 4.14 with an IPI cluster with masters deployed on Availability Zones), you'll see that error in the control-plane-machine-set-operator logs in the openshift-machine-api namespace:

controller.go:329  "msg"="Reconciler error" "error"="error reconciling control plane machine set: unable to generate control plane machine set: unable to generate control plane machine set spec: failed to check OpenStack machines ServerGroup: machine refarch-lv7q9-master-1 has a different ServerGroup than the newest machine" 

At this stage, no ControlPlaneMachineSet (CPMS) was created but the operator is healthy and ready.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.