Disabling the ControlPlaneMachineSet by deleting the CR is blocked by webhook and the CR is stuck in deletion phase in RHOCP 4
Environment
- Red Hat OpenShift Container Platform 4
- VMware vSphere IPI
Issue
- The ControlPlaneMachineSet CR is stuck in deletion phase as it is blocked by webhook.
- The ControlPlaneMachineSet fails when admission webhook denies the request after upgrade from 4.15.22 to 4.16.4 in RHOCP.
ControlPlaneMachineSet/openshift-machine-api/cluster dry-run failed (Forbidden): admission webhook "controlplanemachineset.machine.openshift.io" denied the request: [spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.network: Internal error: network devices should not be set when control plane nodes are in a failure domain: []v1beta1.NetworkDeviceSpec{v1beta1.NetworkDeviceSpec{NetworkName:"10.x.x.x-22 OCP-Infratest", Gateway:"", IPAddrs:[]string(nil), Nameservers:[]string(nil), AddressesFromPools:[]v1beta1.AddressesFromPool(nil)}}, spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.workspace: Internal error: workspace fields should not be set when control plane nodes are in a failure domain: &v1beta1.Workspace{Server:"abc.xx.net", Datacenter:"xxx", Folder:"/xxxx/vm/03 - Test/ocp-infratest", Datastore:"/xxx/datastore/OCP_Infratest", ResourcePool:"/xxx/host/Intern//Resources"}]
Resolution
Delete the validatingwebhookconfiguration controlplanemachineset.machine.openshift.io , it will delete the ControlPlaneMachineSet.
$ oc delete validatingwebhookconfiguration controlplanemachineset.machine.openshift.io
Root Cause
- The ControlPlaneMachineSet custom resource is not deleted due to the
validatingwebhookconfiguration.
Diagnostic Steps
- Check if the cluster is configured with the failureDomains option:
$ oc get infrastructure/cluster -o yaml
...
spec:
cloudConfig:
key: config
name: cloud-provider-config
platformSpec:
type: VSphere
vsphere:
apiServerInternalIPs:
- 10.x.x.x
failureDomains: ---------> here
- name: zone1
region: de
server: abc.xx.net
topology:
computeCluster: /xxx/host/Intern
datacenter: xxxx
datastore: /xxx/datastore/OCP_Infratest
folder: /xxx/vm/03 - Test/ocp-infratest
networks:
- 10.x.x.x OCP-Infratest
resourcePool: /xxx/host/Intern//Resources
zone: zone1
- The control-plane-machine-set-operator pod log shows that the ControlPlaneMachineSet CR is blocked by webhook and is stuck in deletion phase:
$ oc logs control-plane-machine-set-operator-6d94bb84cd-qwhg8 -n openshift-machine-api
E0813 08:47:04.179532 1 controller.go:329] "msg"="Reconciler error" "error"="error reconciling control plane machine set: failed to update control plane machine set: admission webhook "controlplanemachineset.machine.openshift.io" denied the request: [spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.network: Internal error: network devices should not be set when control plane nodes are in a failure domain: []v1beta1.NetworkDeviceSpec{v1beta1.NetworkDeviceSpec{NetworkName:"10.x.x.x OCP-Infratest", Gateway:"", IPAddrs:[]string(nil), Nameservers:[]string(nil), AddressesFromPools:[]v1beta1.AddressesFromPool(nil)}}, spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.workspace: Internal error: workspace fields should not be set when control plane nodes are in a failure domain: &v1beta1.Workspace{Server:"s999vmvcs04.win.cs.nuernberger.net", Datacenter:"xxxx", Folder:"/xxx/vm/03 - Test/ocp-infratest", Datastore:"/xxx/ab/OCP_Infratest", ResourcePool:"/xxx/host/Intern//Resources"}]" "controller"="controlplanemachineset" "reconcileID"="abcd"
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments