Scaling MachineSet with acceleratedNetworking is causing panic in machine-controller on OpenShift Container Platform 4
Issue
-
Recently, the
MachineSetwere changed, to enable some accelerated networking configuration. Overall, it worked fine, but we found someMachinesstuck during provisioning. Checkingopenshift-machine-apinamespace, it was found that themachine-controllerwas crashing on a loop with the below stack trace. Only way to resolve it was to disableacceleratedNetworkingand change thevmSize.E1003 07:32:21.097221 1 controller.go:129] Unable to set scale from zero annotations: unknown instance type: %sStandard_D8s_v5 E1003 07:32:21.097234 1 controller.go:130] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU] I1003 07:32:21.097505 1 logr.go:252] events "msg"="Warning" "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1","uid":"9c96244b-70db-4a43-bae0-ef3db1881b5e","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"32999"} "reason"="FailedUpdate" I1003 07:32:21.119355 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="ci-ln-14rihvk-1d09d-q7862-worker-centralus1" "namespace"="openshift-machine-api" I1003 07:32:21.135903 1 controller.go:72] controllers/MachineSet "msg"="Reconciling" "machineset"="ci-ln-14rihvk-1d09d-q7862-worker-centralus2" "namespace"="openshift-machine-api" I1003 07:32:21.174299 1 reflector.go:219] Starting reflector *v1.Secret (9m50.552991678s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250 I1003 07:32:21.174318 1 reflector.go:255] Listing and watching *v1.Secret from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250 I1003 07:32:21.583232 1 reconciler.go:404] Provisioning state is 'Succeeded' for machine ci-ln-14rihvk-1d09d-q7862-worker-centralus3-fk8wc I1003 07:32:21.583258 1 controller.go:319] ci-ln-14rihvk-1d09d-q7862-worker-centralus3-fk8wc: reconciling machine triggers idempotent update I1003 07:32:21.583264 1 actuator.go:173] Updating machine ci-ln-14rihvk-1d09d-q7862-worker-centralus3-fk8wc I1003 07:32:21.738375 1 machine_scope.go:176] ci-ln-14rihvk-1d09d-q7862-worker-centralus3-fk8wc: status unchanged I1003 07:32:21.738424 1 machine_scope.go:192] ci-ln-14rihvk-1d09d-q7862-worker-centralus3-fk8wc: patching machine I1003 07:32:21.798872 1 controller.go:175] ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88: reconciling Machine I1003 07:32:21.798896 1 actuator.go:213] ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88: actuator checking if machine exists W1003 07:32:21.885062 1 virtualmachines.go:93] vm ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88 not found: %!w(string=compute.VirtualMachinesClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource 'Microsoft.Compute/virtualMachines/ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88' under resource group 'ci-ln-14rihvk-1d09d-q7862-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix") I1003 07:32:21.885098 1 controller.go:386] ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88: reconciling machine triggers idempotent create I1003 07:32:21.885107 1 actuator.go:85] Creating machine ci-ln-14rihvk-1d09d-q7862-worker-nic-centralus1-9mw88 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x18aa08e] goroutine 509 [running]: github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).createNetworkInterface(0xc0000b4100, {0x1fa31c8, 0xc00018e310}, {0xc000d1c100, 0x39}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:508 +0x1ee github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).CreateMachine(0xc0000b4100, {0x1fa31c8, 0xc00018e310}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:119 +0x105 github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).Create(0xc0000b4100, {0x1fa31c8, 0xc00018e310}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:97 +0x46 github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Actuator).Create(0xc0005a1d10, {0x1, 0x1}, 0xc000b1c000) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/actuator.go:96 +0x2c5 github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000119040, {0x1fa3238, 0xc000c88ab0}, {{{0xc000620540, 0x1c20180}, {0xc0008a2840, 0x30}}}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:387 +0xab4 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0000c0160, {0x1fa3238, 0xc000c88a80}, {{{0xc000620540, 0x1c20180}, {0xc0008a2840, 0x413af4}}}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x26f sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0000c0160, {0x1fa3190, 0xc00081db80}, {0x1b18d80, 0xc000144860}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x33e sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0000c0160, {0x1fa3190, 0xc00081db80}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x357 -
Scaling MachineSet with acceleratedNetworking is causing panic in machine-controller on OpenShift Container Platform 4
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.