Application processes hung in direct reclaim under congestion_wait() / shrink_inactive_list()
Issue
-
The problem is that several processes may get stuck pretty much forever
in shrink_inactive_list()/too_many_isolated() loop in the direct reclaim path.
These would than block other processes via rw sema and the hang would cascade. -
Some of symptoms can be seen :
- Lots of tasks were waiting on semaphore.
- The semaphore holder was "numad". It was in page reclaim code, calling congestion_wait() and waiting.
- Another tasks were waiting in the same congestion wait queue at that time.
- The system was running out of memory and a little bit swapping at that time, the number of free pages did not
fall below the minimum watermark on all zones though.
Environment
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.