Application processes hung in direct reclaim under congestion_wait() / shrink_inactive_list()

Solution Verified - Updated -

Issue

  • The problem is that several processes may get stuck pretty much forever
    in shrink_inactive_list()/too_many_isolated() loop in the direct reclaim path.
    These would than block other processes via rw sema and the hang would cascade.

  • Some of symptoms can be seen :

    • Lots of tasks were waiting on semaphore.
    • The semaphore holder was "numad". It was in page reclaim code, calling congestion_wait() and waiting.
    • Another tasks were waiting in the same congestion wait queue at that time.
    • The system was running out of memory and a little bit swapping at that time, the number of free pages did not
      fall below the minimum watermark on all zones though.

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content