How to disable the ControlPlaneMachineSet when control plane nodes are in a failure domain in RHOCP 4?

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform 4
  • VMware vSphere IPI

Issue

  • Is there a way to disable the ControlPlaneMachineSet in RHOCP 4?
  • How to disable the ControlPlaneMachineSet due the errors related to failureDomains?

Resolution

  • Take a backup of ControlPlaneMachineSet CR:
$ oc get controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api -o yaml > CPMS.yaml
  • Set the ControlPlaneMachineSet object to Inactive but the .spec.state field in an activated ControlPlaneMachineSet custom resource (CR) cannot be changed from Active to Inactive. To disable the control plane machine set, delete the CR so that it is removed from the cluster.
$ oc delete controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api
  • Edit the ControlPlaneMachineSet to remove the network and workspace fields and save it.
$ oc -n openshift-machine-api edit controlplanemachineset.machine.openshift.io cluster
//remove the network and workspace fields
  • Verify the configuration in the CR is correct after editing the ControlPlaneMachineSet:
$ oc -n openshift-machine-api get controlplanemachineset.machine.openshift.io cluster -oyaml
  • When the configuration is correct, activate the CR by setting the .spec.state field to Active and saving your changes. For more information refer documentation.

  • If there are issues while disabling the ControlPlaneMachineSet due to webhook, kindly refer KCS.

Root Cause

  • The failureDomains option for the ControlPlaneMachineSet object in vSphere was introduced in RHOCP 4.16
  • The infrastructure by default is already set to use the failureDomains option in CPMS.

Diagnostic Steps

  • Trying to apply a ControlPlaneMachineSet from a CPMS.yaml file, the admission webhook complains about failureDomains which is not in use returns the following error message:
  ControlPlaneMachineSet/openshift-machine-api/cluster dry-run failed (Forbidden): admission webhook "controlplanemachineset.machine.openshift.io" denied the request: [spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.network: Internal error: network devices should not be set when control plane nodes are in a failure domain: []v1beta1.NetworkDeviceSpec{v1beta1.NetworkDeviceSpec{NetworkName:"10.x.x.x-22 OCP-Infratest", Gateway:"", IPAddrs:[]string(nil), Nameservers:[]string(nil), AddressesFromPools:[]v1beta1.AddressesFromPool(nil)}}, spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.workspace: Internal error: workspace fields should not be set when control plane nodes are in a failure domain: &v1beta1.Workspace{Server:"abc.xx.net", Datacenter:"xxx", Folder:"/xxxx/vm/03 - Test/ocp-infratest", Datastore:"/xxx/datastore/OCP_Infratest", ResourcePool:"/xxx/host/Intern//Resources"}]
  • The cluster is originally configured with the failureDomains option:
$ oc get infrastructure/cluster -o yaml
...
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: VSphere
    vsphere:
      apiServerInternalIPs:
      - 10.x.x.x
      failureDomains:                                                   ---------> here
      - name: zone1
        region: de
        server: abc.xx.net
        topology:
          computeCluster: /xxx/host/Intern
          datacenter: xxxx
          datastore: /xxx/datastore/OCP_Infratest
          folder: /xxx/vm/03 - Test/ocp-infratest
          networks:
          - 10.x.x.x OCP-Infratest
          resourcePool: /xxx/host/Intern//Resources
        zone: zone1
      ingressIPs:
      - 10.x.x.x
      machineNetworks: []
      nodeNetworking:
        external: {}
        internal: {}
      vcenters:
      - datacenters:
        - xxxx
        port: ab
        server: abc.xx.net

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments