Scaling up and scaling down OpenShift 4.x nodes, and draining and rescheduling pods during planned maintenance results in performance issues

Solution In Progress - Updated -

Environment

  • Red Hat OpenShift Container Platform 4.x

Issue

  • When conducting upgrades or other planned maintenance, the OpenShift Cluster Autoscaler will attempt to scale up/scale down the amount of nodes in the cluster
  • Scaling up/scaling down nodes and draining and rescheduling pods is time and compute resource intensive

Resolution

  • A Request for Feature Enhancement (RFE) has been submitted
  • RFE-3281 will address adding a way for a cluster admin to manually pause autoscaling in order to conduct scheduled maintenance with less node churn

Root Cause

  • The OpenShift cluster autoscaler attempts to react to node maintenance (cordon, drain, reboot) by creating more nodes and scheduling pods. This adds additional compute overhead and prolongs the planned maintenance/upgrade.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments