Machine pool would not auto scale down even with autoscaling enabled

Solution In Progress - Updated -

Environment

  • Red Hat OpenShift Service on AWS
    • 4.7
  • AWS

Issue

  • Machine Autoscaler has been enabled on a MachinePool in a ROSA cluster.
  • A Node with low resource utilisation will not automatically scale down.
  • 2 nodes with low resource utilisation will not scale down to minimum 1.

Resolution

Review the remaining deployments/pods on the node with low resource utilisation.

Specifically, review the resource > requests of any remaining Deployment / DeploymentConfig on the low resource utilising node(s).

Check the cpu or memory under resource > requests and ensure that they are not higher than what is available on existing nodes.

If the requests > cpu or memory are higher, review and lower as needed to meet the resource availability on other/remaining nodes.

As a result, pods should now be redeployed to other available nodes and the node with low resource utilisation should scale down automatically as expected.

Root Cause

If a MachinePool with auto-scaling enabled, is scaling down nodes as expected, however there are nodes that remain active even with low resource utilisation, it is likely that there are deployments within the node that contain resource > requests that are higher than what is available on the remaining nodes.

As a result, the node with the DeploymentConfigthat contains a high resource > requests is unable to scale down automatically.

Diagnostic Steps

Confirm that resource usage is low on the node(s)

$ oc adm top node | egrep "NAME|worker1"
NAME        CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
worker1     654m         9%     3935Mi          6%

Check cluster autoscaler logs for any errors. If presented with logs which state that node group min size reached, review Applying autoscaling to an OpenShift Container Platform cluster to verify that necessary conditions for node removal by cluster autoscaler are met. Additionally review the resolution above.

$ oc logs pod/cluster-autoscaler-default -n openshift-machine-api 
...
...I1025 06:07:30.079913       1 static_autoscaler.go:402] No unschedulable pods
I1025 06:07:32.478040       1 pre_filtering_processor.go:66] Skipping worker1 - node group min size reached
I1025 06:07:32.479018       1 pre_filtering_processor.go:66] Skipping worker2 - node group min size reached
I1025 06:07:32.479905       1 pre_filtering_processor.go:66] Skipping worker3 - node group min size reached
I1025 06:07:34.877804       1 scale_down.go:868] No candidates for scale down

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments