Worker nodes over-committed and causing nodes to restart or go "NotReady"

Solution Verified - Updated -

Issue

  • One or more worker nodes restart unexpectedly, become NotReady, or are replaced.

  • This issue is commonly observed with one or more of the following symptoms:

    • Worker nodes restart without an obvious reason

    • Worker nodes appear as NotReady

    • Nodes are replaced automatically after becoming unhealthy

    • oc describe node shows the worker node as overcommitted

    • Pods on the affected node restart, are evicted, or become unstable

  • Why are worker nodes restarting and transitioning to NotReady?

  • Why does oc describe node show the node as overcommitted?

  • Can node overcommitment lead to pod evictions, OOMKilled events, or node instability?

  • Why are nodes automatically replaced after becoming unhealthy?

Environment

  • Red Hat OpenShift Service on AWS (ROSA)
    • 4.x
  • Red Hat OpenShift Dedicated (OSD)
    • 4.x
  • Azure Red Hat OpenShift (ARO)
    • 4.x
  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content