Application processes hung in direct reclaim under congestion_wait() / shrink_inactive_list()
Issue
-
The problem is that several processes may get stuck pretty much forever
in shrink_inactive_list()/too_many_isolated() loop in the direct reclaim path.
These would than block other processes via rw sema and the hang would cascade. -
Some of symptoms can be seen :
- Lots of tasks were waiting on semaphore.
- The semaphore holder was "numad". It was in page reclaim code, calling congestion_wait() and waiting.
- Another tasks were waiting in the same congestion wait queue at that time.
- The system was running out of memory and a little bit swapping at that time, the number of free pages did not
fall below the minimum watermark on all zones though.
Environment
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
