How to disable the ControlPlaneMachineSet when control plane nodes are in a failure domain in RHOCP 4?
Environment
- Red Hat OpenShift Container Platform 4
- VMware vSphere IPI
Issue
- Is there a way to disable the ControlPlaneMachineSet in RHOCP 4?
- How to disable the ControlPlaneMachineSet due the errors related to failureDomains?
Resolution
- Take a backup of ControlPlaneMachineSet CR:
$ oc get controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api -o yaml > CPMS.yaml
- Set the ControlPlaneMachineSet object to
Inactive
but the .spec.state field in an activated ControlPlaneMachineSet custom resource (CR) cannot be changed from Active to Inactive. To disable the control plane machine set, delete the CR so that it is removed from the cluster.
$ oc delete controlplanemachineset.machine.openshift.io cluster -n openshift-machine-api
- Edit the ControlPlaneMachineSet to remove the
network
andworkspace
fields and save it.
$ oc -n openshift-machine-api edit controlplanemachineset.machine.openshift.io cluster
//remove the network and workspace fields
- Verify the configuration in the CR is correct after editing the ControlPlaneMachineSet:
$ oc -n openshift-machine-api get controlplanemachineset.machine.openshift.io cluster -oyaml
-
When the configuration is correct, activate the CR by setting the .spec.state field to Active and saving your changes. For more information refer documentation.
-
If there are issues while disabling the ControlPlaneMachineSet due to webhook, kindly refer KCS.
Root Cause
- The failureDomains option for the ControlPlaneMachineSet object in vSphere was introduced in RHOCP 4.16
- The infrastructure by default is already set to use the failureDomains option in CPMS.
Diagnostic Steps
- Trying to apply a ControlPlaneMachineSet from a CPMS.yaml file, the admission webhook complains about failureDomains which is not in use returns the following error message:
ControlPlaneMachineSet/openshift-machine-api/cluster dry-run failed (Forbidden): admission webhook "controlplanemachineset.machine.openshift.io" denied the request: [spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.network: Internal error: network devices should not be set when control plane nodes are in a failure domain: []v1beta1.NetworkDeviceSpec{v1beta1.NetworkDeviceSpec{NetworkName:"10.x.x.x-22 OCP-Infratest", Gateway:"", IPAddrs:[]string(nil), Nameservers:[]string(nil), AddressesFromPools:[]v1beta1.AddressesFromPool(nil)}}, spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.workspace: Internal error: workspace fields should not be set when control plane nodes are in a failure domain: &v1beta1.Workspace{Server:"abc.xx.net", Datacenter:"xxx", Folder:"/xxxx/vm/03 - Test/ocp-infratest", Datastore:"/xxx/datastore/OCP_Infratest", ResourcePool:"/xxx/host/Intern//Resources"}]
- The cluster is originally configured with the failureDomains option:
$ oc get infrastructure/cluster -o yaml
...
spec:
cloudConfig:
key: config
name: cloud-provider-config
platformSpec:
type: VSphere
vsphere:
apiServerInternalIPs:
- 10.x.x.x
failureDomains: ---------> here
- name: zone1
region: de
server: abc.xx.net
topology:
computeCluster: /xxx/host/Intern
datacenter: xxxx
datastore: /xxx/datastore/OCP_Infratest
folder: /xxx/vm/03 - Test/ocp-infratest
networks:
- 10.x.x.x OCP-Infratest
resourcePool: /xxx/host/Intern//Resources
zone: zone1
ingressIPs:
- 10.x.x.x
machineNetworks: []
nodeNetworking:
external: {}
internal: {}
vcenters:
- datacenters:
- xxxx
port: ab
server: abc.xx.net
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments