System panicked with "Kernel panic - not syncing: System is deadlocked on memory"
Environment
- Red Hat Enterprise Linux 9
Issue
- The server hung due to an out-of-memory situation.
- The kernel panicked due to Out of memory and no killable processes.
Resolution
- Engage your application vendor to check the actual requirement for hugepages by application and reset the pre-allocation accordingly. Also ensure that the application is hugepages aware.
Root Cause
Unused HugePages will just be wasted RAM - reserved and never touched - which could otherwise be used by other applications or by the kernel as filesystem cache to speed up the system.
Diagnostic Steps
Pre-requisites
-
Deploy kdump in Order to Collect a vmcore:
- Vmcore analyis is required to determine if you are being impacted by this issue. This first requires that a vmcore is dumped successfully.
- If the
kexec-tools
package is absent or thekdump
service is inactive, please reference the following article to install, enable, start, and configure kdump:
How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
-
Prepare crash Environment for vmcore Analysis:
- Please reference the following article to set up a vmcore analysis environment:
How to set up a vmcore analysis environment?
- Please reference the following article to set up a vmcore analysis environment:
Vmcore Analysis
-
System Information:
crash> sys |grep -eREL -ePAN -eLOAD LOAD AVERAGE: 0.39, 0.30, 0.12 RELEASE: 5.14.0-162.6.1.el9_1.x86_64 PANIC: "Kernel panic - not syncing: System is deadlocked on memory" crash> sys -i |head -5 DMI_BIOS_VENDOR: VMware, Inc. DMI_BIOS_VERSION: VMW71.00V.18227214.B64.2106252220 DMI_BIOS_DATE: 06/25/2021 DMI_SYS_VENDOR: VMware, Inc. DMI_PRODUCT_NAME: VMware7,1
-
Memory usage indicates that the system has a total RAM of 15.4 GiB, of which 15 GiB are pre-allocated to hugepages, but the application is not using any hugepages:
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 4027407 15.4 GB ---- FREE 32874 128.4 MB 0% of TOTAL MEM USED 3994533 15.2 GB 99% of TOTAL MEM SHARED 4076 15.9 MB 0% of TOTAL MEM BUFFERS 0 0 0% of TOTAL MEM CACHED 20514 80.1 MB 0% of TOTAL MEM SLAB 18267 71.4 MB 0% of TOTAL MEM TOTAL HUGE 3931648 15 GB ---- HUGE FREE 3931648 15 GB 100% of TOTAL HUGE TOTAL SWAP 4194303 16 GB ---- SWAP USED 6874 26.9 MB 0% of TOTAL SWAP SWAP FREE 4187429 16 GB 99% of TOTAL SWAP COMMIT LIMIT 4242182 16.2 GB ---- COMMITTED 212602 830.5 MB 5% of TOTAL LIMIT
-
Backtrace of the panic task shows out of memory function before panic:
crash> bt PID: 1 TASK: ffffa05e40298000 CPU: 0 COMMAND: "systemd-shutdow" #0 [ffffafc24001f518] machine_kexec at ffffffff8aa6973d #1 [ffffafc24001f568] __crash_kexec at ffffffff8abbe29d #2 [ffffafc24001f630] panic at ffffffff8b48cb6e #3 [ffffafc24001f6b0] out_of_memory at ffffffff8b499873 #4 [ffffafc24001f6d8] out_of_memory at ffffffff8acc65ed #5 [ffffafc24001f6f8] __alloc_pages_slowpath.constprop.0 at ffffffff8ad27cdc #6 [ffffafc24001f7c0] __alloc_pages at ffffffff8ad27fae #7 [ffffafc24001f820] alloc_pages_vma at ffffffff8ad4785f #8 [ffffafc24001f860] __read_swap_cache_async at ffffffff8ad305eb #9 [ffffafc24001f8d0] swap_cluster_readahead at ffffffff8ad30bac #10 [ffffafc24001f958] shmem_swapin at ffffffff8acdb747 #11 [ffffafc24001fae8] shmem_swapin_page at ffffffff8acdd11a #12 [ffffafc24001fb60] shmem_unuse_inode at ffffffff8acdd73d #13 [ffffafc24001fd88] shmem_unuse at ffffffff8ace085f #14 [ffffafc24001fdd0] try_to_unuse at ffffffff8ad35369 #15 [ffffafc24001fe38] __do_sys_swapoff at ffffffff8ad358f3 #16 [ffffafc24001fe90] do_syscall_64 at ffffffff8b4d7169 #17 [ffffafc24001ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b60009b RIP: 00007f6e37d61e1b RSP: 00007ffe92aeadc8 RFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e37d61e1b RDX: 0000000000000000 RSI: 00007ffe92aea390 RDI: 000055a7d1ab2820 RBP: 00007ffe92aeafd0 R8: 0000000000000000 R9: 00007ffe92aea1b0 R10: 00007ffe92aea370 R11: 0000000000000206 R12: 000055a7d1ab0da0 R13: 000055a7d1ab2820 R14: 000055a7d119307c R15: 0000000000000006 ORIG_RAX: 00000000000000a8 CS: 0033 SS: 002b
-
No memory ballooning:
crash> balloon | grep -A 3 size max_page_size = VMW_BALLOON_2M_PAGE, size = { counter = 0x0 }, target = 0x0,
-
Kernel ring buffer:
[ 140.931624] Out of memory and no killable processes... [ 140.932335] Kernel panic - not syncing: System is deadlocked on memory [ 140.933036] CPU: 0 PID: 1 Comm: systemd-shutdow Kdump: loaded Not tainted 5.14.0-162.6.1.el9_1.x86_64 #1 [ 140.933746] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021 [ 140.935187] Call Trace: [ 140.935894] dump_stack_lvl+0x34/0x48 [ 140.936617] panic+0x102/0x2d4 [ 140.937364] out_of_memory.part.0.cold+0x2f/0x7e [ 140.938080] out_of_memory+0x3d/0x80 [ 140.938779] __alloc_pages_slowpath.constprop.0+0x7cc/0x8a0 [ 140.939487] __alloc_pages+0x1fe/0x230 [ 140.940188] alloc_pages_vma+0x8f/0x2d0 [ 140.940886] __read_swap_cache_async+0xfb/0x2c0 [ 140.941576] swap_cluster_readahead+0x15c/0x2c0 [ 140.942275] shmem_swapin+0xa7/0xf0 [ 140.942972] shmem_swapin_page+0x1aa/0x380 [ 140.943655] shmem_unuse_inode+0x44d/0x550 [ 140.944323] shmem_unuse+0x8f/0x190 [ 140.944960] try_to_unuse+0x69/0x410 [ 140.945583] __do_sys_swapoff+0x1e3/0x6c0 [ 140.946202] do_syscall_64+0x59/0x90 [ 140.946786] ? syscall_exit_to_user_mode+0x12/0x30 [ 140.947381] ? do_syscall_64+0x69/0x90 [ 140.947949] ? __rseq_handle_notify_resume+0x26/0xc0 [ 140.948506] ? exit_to_user_mode_loop+0xf6/0x160 [ 140.949053] ? exit_to_user_mode_prepare+0xb6/0x100 [ 140.949587] ? syscall_exit_to_user_mode+0x12/0x30 [ 140.950106] ? do_syscall_64+0x69/0x90 [ 140.950608] ? sysvec_apic_timer_interrupt+0x3c/0x90 [ 140.951105] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 140.951615] RIP: 0033:0x7f6e37d61e1b [ 140.952101] Code: 73 01 c3 48 8b 0d 05 b0 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a8 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 af 1b 00 f7 d8 64 89 01 48 [ 140.953057] RSP: 002b:00007ffe92aeadc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a8 [ 140.953532] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e37d61e1b [ 140.954003] RDX: 0000000000000000 RSI: 00007ffe92aea390 RDI: 000055a7d1ab2820 [ 140.954446] RBP: 00007ffe92aeafd0 R08: 0000000000000000 R09: 00007ffe92aea1b0 [ 140.954901] R10: 00007ffe92aea370 R11: 0000000000000206 R12: 000055a7d1ab0da0 [ 140.955349] R13: 000055a7d1ab2820 R14: 000055a7d119307c R15: 0000000000000006
-
Out of memory events:
crash> log | grep 'Out of memory' [ 139.587149] Out of memory: Killed process 6535 (swapoff) total-vm:6888kB, anon-rss:152kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0 [ 140.007426] Out of memory: Killed process 6561 (systemd-udevd) total-vm:33192kB, anon-rss:760kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 [ 140.931624] Out of memory and no killable processes...
-
Memory isn't used by processes much when oom-killer is invoked:
crash> ps -G | sed 's/^>//' | awk '{ m[$9]+=$8 } END { for (item in m) { printf "%20s %10s KiB\n", item, m[item] } }' | sort -k 2 -r -n | head systemd-shutdow 17604 KiB [zswap-shrink] 0 KiB [xprtiod] 0 KiB [xfs-reclaim/dm-] 0 KiB [xfs_mru_cache] 0 KiB [xfs-log/dm-0] 0 KiB [xfs-gc/dm-0] 0 KiB [xfs-conv/dm-0] 0 KiB [xfs-cil/dm-0] 0 KiB [xfs-buf/dm-0] 0 KiB
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments