Failure to wake up RCU grace-period thread entails symptoms such as unresponsiveness of entire system, hung tasks in synchronize_rcu() and high slab cache memory usage
Issue
- OpenShift node went into NotReady state
- pods in OpenShift node are stuck in Terminating state
- server is unresponsive
- kernel logs "INFO: task ... blocked for more than 120 seconds" messages with tasks hung in synchronize_rcu() function
- kernel panic "hung_task: blocked tasks"
- high slab cache memory usage
Environment
- Red Hat Enterprise Linux 8.0 and 8.1
- OpenShift Container Platform 4.2, 4.3 and 4.4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.