Worker nodes over-committed and causing nodes to restart or go "NotReady"
Issue
-
One or more worker nodes restart unexpectedly, become NotReady, or are replaced.
-
This issue is commonly observed with one or more of the following symptoms:
-
Worker nodes restart without an obvious reason
-
Worker nodes appear as NotReady
-
Nodes are replaced automatically after becoming unhealthy
-
oc describe node shows the worker node as overcommitted
-
Pods on the affected node restart, are evicted, or become unstable
-
-
Why are worker nodes restarting and transitioning to NotReady?
-
Why does oc describe node show the node as overcommitted?
-
Can node overcommitment lead to pod evictions, OOMKilled events, or node instability?
-
Why are nodes automatically replaced after becoming unhealthy?
Environment
- Red Hat OpenShift Service on AWS (ROSA)
- 4.x
- Red Hat OpenShift Dedicated (OSD)
- 4.x
- Azure Red Hat OpenShift (ARO)
- 4.x
- Red Hat OpenShift Container Platform (RHOCP)
- 4.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.