Scaling of Machine is failing in OpenShift Container Platform 4 because cluster-autoscaler-default pod is in CrashLoopBackOff state

Solution Verified - Updated -

Issue

  • Many pods are stuck in Pending state, waiting for new Machines being deployed but the cluster-autoscaler-default pod is stuck in CrashLoopBackOff state hence failing to scale and reporting below panic string.

    panic: runtime error: invalid memory address or nil pointer dereference
    [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1650a64]
    
    goroutine 91 [running]:
    k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).checkAttachableInlineVolume(_, {{0xc00065be20, 0x10}, {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}}, ...)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:235 +0x6c4
    k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).filterAttachableVolumes(0xc00035a6c0, 0xc000672328, 0x4?, 0x1, 0xc000736570?)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:175 +0x625
    k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits.(*CSILimits).Filter(0xc00035a6c0, {0x2fbf7e0?, 0x2f8e1a0?}, 0x7f62e3fe90c8?, 0xc000672328, 0xc0260c1200)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/plugins/nodevolumelimits/csi.go:103 +0x2f9
    k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).runFilterPlugin(0x0?, {0x1f401c8?, 0xc00012a008?}, {0x7f6338340178?, 0xc00035a6c0?}, 0x0?, 0x0?, 0xc0004986e0?)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:736 +0x2bd
    k8s.io/kubernetes/pkg/scheduler/framework/runtime.(*frameworkImpl).RunFilterPlugins(0xc00032e380, {0x1f401c8, 0xc00012a008}, 0x49?, 0x0?, 0x0?)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/kubernetes/pkg/scheduler/framework/runtime/framework.go:718 +0xfa
    k8s.io/autoscaler/cluster-autoscaler/simulator.(*SchedulerBasedPredicateChecker).CheckPredicates(0xc00056f880, {0x1f4b300, 0xc000014628}, 0x49?, {0xc01e0c19d0, 0x6f})
        /go/src/k8s.io/autoscaler/cluster-autoscaler/simulator/scheduler_based_predicates_checker.go:168 +0x20d
    k8s.io/autoscaler/cluster-autoscaler/core.computeExpansionOption(0xc005c50380, {0xc019e27f80, 0xb, 0x1d?}, {0x1f4bea8?, 0xc025754e00?}, 0xc01680e240, {0xc018fd2c88, 0x0, 0x0})
        /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:293 +0x616
    k8s.io/autoscaler/cluster-autoscaler/core.ScaleUp(0xc005c50380, 0xc000324d20, 0x2f8e1a0?, {0xc01e293900, 0x15, 0x20}, {0xc015f3c400, 0x52, 0x80}, {0xc015a39e00, ...}, ...)
        /go/src/k8s.io/autoscaler/cluster-autoscaler/core/scale_up.go:446 +0x4676
    k8s.io/autoscaler/cluster-autoscaler/core.(*StaticAutoscaler).RunOnce(0xc000001900, {0x4?, 0x3235343233363836?, 0x2f8e1a0?})
        /go/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:461 +0x1ff5
    main.run(0x34363a2273657479?, {0x1f46838, 0xc000443a40})
        /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:421 +0x2cd
    main.main.func2({0x65722d7466696873?, 0x65642d657361656c?})
        /go/src/k8s.io/autoscaler/cluster-autoscaler/main.go:508 +0x25
    created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
        /go/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:211 +0x11b
    
  • Machine autoscaler is failing to scale Machines because of cluster-autoscaler-default being stuck in CrashLoopBackOff state.

Environment

  • Red Hat OpenShift Container Platform 4.12 and 4.13

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content