Server crashed or became unresponsive with memory (physical and swap) fully utilised

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux(All versions)

Issue

  • Server crashed with all memory fully consumed
  • System become unresponsive and vmcore analysis suggested that system ran out of memory
  • Found server in hung/frozen state, SAR shows memory fully utilized
  • Server rebooted due to OOM
  • Host unavailable due to Out Of Memory

Resolution

  • Complete utilization of memory and swap can lead a system to hang due to the increased I/O workload, or can cause a crash due to system processes killed by the Out Of Memory killer process.
  • To resolve this issue, one needs to identify the reason for excessive memory consumption and act accordingly. Some possible actions that may be taken depending on the culprit are listed below:
    • Add more RAM to the system.
    • Increase the swap space available to the system, though this can cause a further performance impact due to frequent swapping in and out of data to disk.
    • Tune applications to use less memory where possible.
    • Reallocate the workload to another system.

Root Cause

  • The system was slow because it was out of memory, and there were a lot of processes trying to free memory.

Diagnostic Steps

  • Relevant sar performance data from the system is given below, which shows increase in paging and swapping activity, and almost all of the physical memory and swap used up. The part of memory used up for buffers and cache is comparatively low.
00:00:00 pswpin/s  pswpout/s
11:40:59   244.37    3371.47

00:00:00  pgpgin/s  pgpgout/s   fault/s majflt/s
11:40:59   1047.18   13686.11  14256.36    32.10


00:00:00  kbmemfree  kbmemused  %memused  kbbuffers  kbcached  kbswpfree  kbswpused  %swpused  kbswpcad
11:20:01   25136904    7694080     23.44     150976   5782872   16776792        160      0.00         0
11:40:59     150428   32680556     99.54       2428     35976          0   16776952    100.00    243696
  • The vmcore analysis from a system that was manually crashed with 'sysrq+c' during a low memory period shows:
crash> kmem -i
              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM  8207746      31.3 GB         ----
      FREE    37508     146.5 MB    0% of TOTAL MEM
      USED  8170238      31.2 GB   99% of TOTAL MEM
    SHARED     8811      34.4 MB    0% of TOTAL MEM
   BUFFERS      109       436 KB    0% of TOTAL MEM
    CACHED     3053      11.9 MB    0% of TOTAL MEM
      SLAB    15598      60.9 MB    0% of TOTAL MEM

TOTAL HIGH        0            0    0% of TOTAL MEM
 FREE HIGH        0            0    0% of TOTAL HIGH
 TOTAL LOW  8207746      31.3 GB  100% of TOTAL MEM
  FREE LOW    37508     146.5 MB    0% of TOTAL LOW      <--- No free memory

TOTAL SWAP  4194238        16 GB         ----
 SWAP USED  4194237        16 GB   99% of TOTAL SWAP          
 SWAP FREE        1         4 KB    0% of TOTAL SWAP     <--- No free swap
  • There's very little memory in shared, buffers, cached and slab, so the used memory is probably almost entirely made up of userspace pages, which  would also explain why the swap was used up so easily.
  • Load is very high at 205, but there's only 49 processes in uninterruptible state - that suggests the rest are runnable processes that don't have a CPU to run on.

    crash> sys | grep LOAD
    LOAD AVERAGE: 205.62, 158.06, 104.87
    
  • There are 144 processes in the run queue:

    crash> ps | grep -c RU
    144
    
  • There are 56 processes running shrink_active_list to reclaim memory:

    crash> foreach RU bt > foreach_RU_bt.out
    crash> !grep -c shrink_active_list foreach_RU_bt.out
    56
    
  • There's a lot of processes in cond_resched() which allows a process to yield a CPU but does not change the processes state from runnable.

    crash> !grep -c __cond_resched foreach_RU_bt.out
    125
    
  • There are 49 processes in Uninterruptable "D" state:

    crash> ps | grep -c UN
    49
    
  • There are 112 in an unknown state (ie. crash can't figure out what state they're in):

    crash> ps | grep -c ??
    112
    
  • What are they doing?

    PID: 1      TASK: ffff81082ffb37a0  CPU: 4   COMMAND: "init"
     #0 [ffff81082ffb9868] schedule at ffffffff80063035
     #1 [ffff81082ffb9940] __cond_resched at ffffffff8008bb2c
     #2 [ffff81082ffb9950] cond_resched at ffffffff8006316e
     #3 [ffff81082ffb9960] shrink_inactive_list at ffffffff800c7246
     #4 [ffff81082ffb9b50] shrink_zone at ffffffff80012d02
     #5 [ffff81082ffb9b90] try_to_free_pages at ffffffff800c8028
     #6 [ffff81082ffb9c10] __alloc_pages at ffffffff8000f271
     #7 [ffff81082ffb9c70] __do_page_cache_readahead at ffffffff80012a62
     #8 [ffff81082ffb9d60] filemap_nopage at ffffffff80013401
     #9 [ffff81082ffb9dc0] __handle_mm_fault at ffffffff80008863
    #10 [ffff81082ffb9e60] do_page_fault at ffffffff80066b9a
    #11 [ffff81082ffb9f50] error_exit at ffffffff8005dde9
        RIP: 0000000000406f5d  RSP: 00007fff3876c7e0  RFLAGS: 00010206
        RAX: 00007fff3876c980  RBX: 0000000000000008  RCX: 0000000000000000
        RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00007fff3876cb00
        RBP: 00007fff3876c980   R8: 0000000000000000   R9: 0000000000300000
        R10: 0000000000000000  R11: 00000033c107a5d3  R12: 0000000000006ffc
        R13: 000000000e2ed1bc  R14: 0000000000000000  R15: 0000000000407779
        ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
    
  • They are all in shrink_zone() trying to free up memory and since their state doesn't change from runnable they are all contending for the same resources which seems excessive.
    *CPUs 0, 1, 3, 4, 5, 6, 7 are all stuck on the same spinlock that's held by CPU 2.

    PID: 5769   TASK: ffff81062d1cc7a0  CPU: 0   COMMAND: "corpact"
     #0 [ffffffff80428f30] crash_nmi_callback at ffffffff80079db6
     #1 [ffffffff80428f40] do_nmi at ffffffff80065942
     #2 [ffffffff80428f50] nmi at ffffffff80064fa7
        [exception RIP: .text.lock.spinlock+5]
        RIP: ffffffff80064cb7  RSP: ffff810157ce1910  RFLAGS: 00000286
        RAX: 0000000000000000  RBX: ffff8107fa8ff2e8  RCX: ffff810108f27088
        RDX: ffff8107fa8ff2e9  RSI: 0000000000000001  RDI: ffff8107fa8ff2e8
        RBP: 0000000000000000   R8: ffff81000002b600   R9: ffff81000002b600
        R10: 0000000057ce1bb8  R11: 0000000000000286  R12: 00000000000002bc
        R13: ffff81000002b600  R14: 0000000000000000  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    --- <NMI exception stack> ---
     #3 [ffff810157ce1910] .text.lock.spinlock at ffffffff80064cb7 (via _spin_lock)
     #4 [ffff810157ce1910] page_lock_anon_vma at ffffffff80021a5f
     #5 [ffff810157ce1920] page_referenced at ffffffff8003ba76
     #6 [ffff810157ce1960] shrink_inactive_list at ffffffff800c72c3
     #7 [ffff810157ce1b50] shrink_zone at ffffffff80012d02
     #8 [ffff810157ce1b90] try_to_free_pages at ffffffff800c8028
     #9 [ffff810157ce1c10] __alloc_pages at ffffffff8000f271
    #10 [ffff810157ce1c70] __do_page_cache_readahead at ffffffff80012a62
    #11 [ffff810157ce1d60] filemap_nopage at ffffffff80013401
    #12 [ffff810157ce1dc0] __handle_mm_fault at ffffffff80008863
    #13 [ffff810157ce1e60] do_page_fault at ffffffff80066b9a
    #14 [ffff810157ce1f50] error_exit at ffffffff8005dde9
        RIP: 00002b60f6adde23  RSP: 00000000416a7660  RFLAGS: 00010246
        RAX: 000000000041c9b0  RBX: 0000000000000000  RCX: 000000001110a008
        RDX: 00000000ffffffff  RSI: 00000000ffffffff  RDI: 00007fffb4277db0
        RBP: 00000000416a7680   R8: 0000000011109810   R9: 0000000000000001
        R10: 00000033c1351a30  R11: 0000000000000202  R12: 00007fffb4277dc0
        R13: 00007fffb4277db0  R14: 00000000416a76e0  R15: 00000000000001e0
        ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
    
  • So this system is excessively busy but doing very little since only 1 CPU is really doing anything useful here.
    Now to know how many processes are in reclamation on a zone.

NODE: 0  ZONE: 2  ADDR: ffff81000001b600  NAME: "Normal"
  SIZE: 1310720  PRESENT: 1292800  MIN/LOW/HIGH: 1799/2248/2698
  NR_ACTIVE: 680955  NR_INACTIVE: 538921  FREE: 1770


crash> p -d ((struct zone *)0xffff81000001b600)->reclaim_in_progress
$6 = {
  counter = 268
}

So in the above case it looks 268 process are in page reclaim for the normal zone.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments