Server crashed or became unresponsive with memory (physical and swap) fully utilised
Environment
- Red Hat Enterprise Linux(All versions)
Issue
- Server crashed with all memory fully consumed
- System become unresponsive and vmcore analysis suggested that system ran out of memory
- Found server in hung/frozen state, SAR shows memory fully utilized
- Server rebooted due to OOM
- Host unavailable due to Out Of Memory
Resolution
- Complete utilization of memory and swap can lead a system to hang due to the increased I/O workload, or can cause a crash due to system processes killed by the Out Of Memory killer process.
- To resolve this issue, one needs to identify the reason for excessive memory consumption and act accordingly. Some possible actions that may be taken depending on the culprit are listed below:
- Add more RAM to the system.
- Increase the swap space available to the system, though this can cause a further performance impact due to frequent swapping in and out of data to disk.
- Tune applications to use less memory where possible.
- Reallocate the workload to another system.
Root Cause
- The system was slow because it was out of memory, and there were a lot of processes trying to free memory.
Diagnostic Steps
- Relevant
sar
performance data from the system is given below, which shows increase in paging and swapping activity, and almost all of the physical memory and swap used up. The part of memory used up for buffers and cache is comparatively low.
00:00:00 pswpin/s pswpout/s
11:40:59 244.37 3371.47
00:00:00 pgpgin/s pgpgout/s fault/s majflt/s
11:40:59 1047.18 13686.11 14256.36 32.10
00:00:00 kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
11:20:01 25136904 7694080 23.44 150976 5782872 16776792 160 0.00 0
11:40:59 150428 32680556 99.54 2428 35976 0 16776952 100.00 243696
- The vmcore analysis from a system that was manually crashed with '
sysrq+c
' during a low memory period shows:
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 8207746 31.3 GB ----
FREE 37508 146.5 MB 0% of TOTAL MEM
USED 8170238 31.2 GB 99% of TOTAL MEM
SHARED 8811 34.4 MB 0% of TOTAL MEM
BUFFERS 109 436 KB 0% of TOTAL MEM
CACHED 3053 11.9 MB 0% of TOTAL MEM
SLAB 15598 60.9 MB 0% of TOTAL MEM
TOTAL HIGH 0 0 0% of TOTAL MEM
FREE HIGH 0 0 0% of TOTAL HIGH
TOTAL LOW 8207746 31.3 GB 100% of TOTAL MEM
FREE LOW 37508 146.5 MB 0% of TOTAL LOW <--- No free memory
TOTAL SWAP 4194238 16 GB ----
SWAP USED 4194237 16 GB 99% of TOTAL SWAP
SWAP FREE 1 4 KB 0% of TOTAL SWAP <--- No free swap
- There's very little memory in shared, buffers, cached and slab, so the used memory is probably almost entirely made up of userspace pages, which would also explain why the swap was used up so easily.
-
Load is very high at 205, but there's only 49 processes in uninterruptible state - that suggests the rest are runnable processes that don't have a CPU to run on.
crash> sys | grep LOAD LOAD AVERAGE: 205.62, 158.06, 104.87
-
There are 144 processes in the run queue:
crash> ps | grep -c RU 144
-
There are 56 processes running
shrink_active_list
to reclaim memory:crash> foreach RU bt > foreach_RU_bt.out crash> !grep -c shrink_active_list foreach_RU_bt.out 56
-
There's a lot of processes in
cond_resched()
which allows a process to yield a CPU but does not change the processes state from runnable.crash> !grep -c __cond_resched foreach_RU_bt.out 125
-
There are 49 processes in Uninterruptable "D" state:
crash> ps | grep -c UN 49
-
There are 112 in an unknown state (ie. crash can't figure out what state they're in):
crash> ps | grep -c ?? 112
-
What are they doing?
PID: 1 TASK: ffff81082ffb37a0 CPU: 4 COMMAND: "init" #0 [ffff81082ffb9868] schedule at ffffffff80063035 #1 [ffff81082ffb9940] __cond_resched at ffffffff8008bb2c #2 [ffff81082ffb9950] cond_resched at ffffffff8006316e #3 [ffff81082ffb9960] shrink_inactive_list at ffffffff800c7246 #4 [ffff81082ffb9b50] shrink_zone at ffffffff80012d02 #5 [ffff81082ffb9b90] try_to_free_pages at ffffffff800c8028 #6 [ffff81082ffb9c10] __alloc_pages at ffffffff8000f271 #7 [ffff81082ffb9c70] __do_page_cache_readahead at ffffffff80012a62 #8 [ffff81082ffb9d60] filemap_nopage at ffffffff80013401 #9 [ffff81082ffb9dc0] __handle_mm_fault at ffffffff80008863 #10 [ffff81082ffb9e60] do_page_fault at ffffffff80066b9a #11 [ffff81082ffb9f50] error_exit at ffffffff8005dde9 RIP: 0000000000406f5d RSP: 00007fff3876c7e0 RFLAGS: 00010206 RAX: 00007fff3876c980 RBX: 0000000000000008 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff3876cb00 RBP: 00007fff3876c980 R8: 0000000000000000 R9: 0000000000300000 R10: 0000000000000000 R11: 00000033c107a5d3 R12: 0000000000006ffc R13: 000000000e2ed1bc R14: 0000000000000000 R15: 0000000000407779 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
-
They are all in
shrink_zone()
trying to free up memory and since their state doesn't change from runnable they are all contending for the same resources which seems excessive.
*CPUs 0, 1, 3, 4, 5, 6, 7 are all stuck on the same spinlock that's held by CPU 2.PID: 5769 TASK: ffff81062d1cc7a0 CPU: 0 COMMAND: "corpact" #0 [ffffffff80428f30] crash_nmi_callback at ffffffff80079db6 #1 [ffffffff80428f40] do_nmi at ffffffff80065942 #2 [ffffffff80428f50] nmi at ffffffff80064fa7 [exception RIP: .text.lock.spinlock+5] RIP: ffffffff80064cb7 RSP: ffff810157ce1910 RFLAGS: 00000286 RAX: 0000000000000000 RBX: ffff8107fa8ff2e8 RCX: ffff810108f27088 RDX: ffff8107fa8ff2e9 RSI: 0000000000000001 RDI: ffff8107fa8ff2e8 RBP: 0000000000000000 R8: ffff81000002b600 R9: ffff81000002b600 R10: 0000000057ce1bb8 R11: 0000000000000286 R12: 00000000000002bc R13: ffff81000002b600 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [ffff810157ce1910] .text.lock.spinlock at ffffffff80064cb7 (via _spin_lock) #4 [ffff810157ce1910] page_lock_anon_vma at ffffffff80021a5f #5 [ffff810157ce1920] page_referenced at ffffffff8003ba76 #6 [ffff810157ce1960] shrink_inactive_list at ffffffff800c72c3 #7 [ffff810157ce1b50] shrink_zone at ffffffff80012d02 #8 [ffff810157ce1b90] try_to_free_pages at ffffffff800c8028 #9 [ffff810157ce1c10] __alloc_pages at ffffffff8000f271 #10 [ffff810157ce1c70] __do_page_cache_readahead at ffffffff80012a62 #11 [ffff810157ce1d60] filemap_nopage at ffffffff80013401 #12 [ffff810157ce1dc0] __handle_mm_fault at ffffffff80008863 #13 [ffff810157ce1e60] do_page_fault at ffffffff80066b9a #14 [ffff810157ce1f50] error_exit at ffffffff8005dde9 RIP: 00002b60f6adde23 RSP: 00000000416a7660 RFLAGS: 00010246 RAX: 000000000041c9b0 RBX: 0000000000000000 RCX: 000000001110a008 RDX: 00000000ffffffff RSI: 00000000ffffffff RDI: 00007fffb4277db0 RBP: 00000000416a7680 R8: 0000000011109810 R9: 0000000000000001 R10: 00000033c1351a30 R11: 0000000000000202 R12: 00007fffb4277dc0 R13: 00007fffb4277db0 R14: 00000000416a76e0 R15: 00000000000001e0 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
-
So this system is excessively busy but doing very little since only 1 CPU is really doing anything useful here.
Now to know how many processes are in reclamation on a zone.
NODE: 0 ZONE: 2 ADDR: ffff81000001b600 NAME: "Normal"
SIZE: 1310720 PRESENT: 1292800 MIN/LOW/HIGH: 1799/2248/2698
NR_ACTIVE: 680955 NR_INACTIVE: 538921 FREE: 1770
crash> p -d ((struct zone *)0xffff81000001b600)->reclaim_in_progress
$6 = {
counter = 268
}
So in the above case it looks 268 process are in page reclaim for the normal zone.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments