System panicked with "Kernel panic - not syncing: System is deadlocked on memory"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 9

Issue

  • The server hung due to an out-of-memory situation. 
  • The kernel panicked due to Out of memory and no killable processes.

Resolution

  • Engage your application vendor to check the actual requirement for hugepages by application and reset the pre-allocation accordingly. Also ensure that the application is hugepages aware.

Root Cause

Unused HugePages will just be wasted RAM - reserved and never touched - which could otherwise be used by other applications or by the kernel as filesystem cache to speed up the system.

Diagnostic Steps

Pre-requisites

  1. Deploy kdump in Order to Collect a vmcore:

  2. Prepare crash Environment for vmcore Analysis:

Vmcore Analysis

  1. System Information:

    crash>  sys |grep -eREL -ePAN -eLOAD
    LOAD AVERAGE: 0.39, 0.30, 0.12
         RELEASE: 5.14.0-162.6.1.el9_1.x86_64
           PANIC: "Kernel panic - not syncing: System is deadlocked on memory"
    
    crash> sys -i |head -5
              DMI_BIOS_VENDOR: VMware, Inc.
             DMI_BIOS_VERSION: VMW71.00V.18227214.B64.2106252220
                DMI_BIOS_DATE: 06/25/2021
               DMI_SYS_VENDOR: VMware, Inc.
             DMI_PRODUCT_NAME: VMware7,1
    
  2. Memory usage indicates that the system has a total RAM of 15.4 GiB, of which 15 GiB are pre-allocated to hugepages, but the application is not using any hugepages:

    crash> kmem -i 
                     PAGES        TOTAL      PERCENTAGE
        TOTAL MEM  4027407      15.4 GB         ----
             FREE    32874     128.4 MB    0% of TOTAL MEM
             USED  3994533      15.2 GB   99% of TOTAL MEM
           SHARED     4076      15.9 MB    0% of TOTAL MEM
          BUFFERS        0            0    0% of TOTAL MEM
           CACHED    20514      80.1 MB    0% of TOTAL MEM
             SLAB    18267      71.4 MB    0% of TOTAL MEM
    
       TOTAL HUGE  3931648        15 GB         ----
        HUGE FREE  3931648        15 GB  100% of TOTAL HUGE
    
       TOTAL SWAP  4194303        16 GB         ----
        SWAP USED     6874      26.9 MB    0% of TOTAL SWAP
        SWAP FREE  4187429        16 GB   99% of TOTAL SWAP
    
     COMMIT LIMIT  4242182      16.2 GB         ----
        COMMITTED   212602     830.5 MB    5% of TOTAL LIMIT
    
  3. Backtrace of the panic task shows out of memory function before panic:

    crash> bt
    PID: 1        TASK: ffffa05e40298000  CPU: 0    COMMAND: "systemd-shutdow"
     #0 [ffffafc24001f518] machine_kexec at ffffffff8aa6973d
     #1 [ffffafc24001f568] __crash_kexec at ffffffff8abbe29d
     #2 [ffffafc24001f630] panic at ffffffff8b48cb6e
     #3 [ffffafc24001f6b0] out_of_memory at ffffffff8b499873
     #4 [ffffafc24001f6d8] out_of_memory at ffffffff8acc65ed
     #5 [ffffafc24001f6f8] __alloc_pages_slowpath.constprop.0 at ffffffff8ad27cdc
     #6 [ffffafc24001f7c0] __alloc_pages at ffffffff8ad27fae
     #7 [ffffafc24001f820] alloc_pages_vma at ffffffff8ad4785f
     #8 [ffffafc24001f860] __read_swap_cache_async at ffffffff8ad305eb
     #9 [ffffafc24001f8d0] swap_cluster_readahead at ffffffff8ad30bac
    #10 [ffffafc24001f958] shmem_swapin at ffffffff8acdb747
    #11 [ffffafc24001fae8] shmem_swapin_page at ffffffff8acdd11a
    #12 [ffffafc24001fb60] shmem_unuse_inode at ffffffff8acdd73d
    #13 [ffffafc24001fd88] shmem_unuse at ffffffff8ace085f
    #14 [ffffafc24001fdd0] try_to_unuse at ffffffff8ad35369
    #15 [ffffafc24001fe38] __do_sys_swapoff at ffffffff8ad358f3
    #16 [ffffafc24001fe90] do_syscall_64 at ffffffff8b4d7169
    #17 [ffffafc24001ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b60009b
        RIP: 00007f6e37d61e1b  RSP: 00007ffe92aeadc8  RFLAGS: 00000206
        RAX: ffffffffffffffda  RBX: 0000000000000000  RCX: 00007f6e37d61e1b
        RDX: 0000000000000000  RSI: 00007ffe92aea390  RDI: 000055a7d1ab2820
        RBP: 00007ffe92aeafd0   R8: 0000000000000000   R9: 00007ffe92aea1b0
        R10: 00007ffe92aea370  R11: 0000000000000206  R12: 000055a7d1ab0da0
        R13: 000055a7d1ab2820  R14: 000055a7d119307c  R15: 0000000000000006
        ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
    
  4. No memory ballooning:

    crash> balloon | grep -A 3 size
      max_page_size = VMW_BALLOON_2M_PAGE,
      size = {
        counter = 0x0
      },
      target = 0x0,
    
  5. Kernel ring buffer:

    [  140.931624] Out of memory and no killable processes...
    [  140.932335] Kernel panic - not syncing: System is deadlocked on memory
    [  140.933036] CPU: 0 PID: 1 Comm: systemd-shutdow Kdump: loaded Not tainted 5.14.0-162.6.1.el9_1.x86_64 #1
    [  140.933746] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021
    [  140.935187] Call Trace:
    [  140.935894]  dump_stack_lvl+0x34/0x48
    [  140.936617]  panic+0x102/0x2d4
    [  140.937364]  out_of_memory.part.0.cold+0x2f/0x7e
    [  140.938080]  out_of_memory+0x3d/0x80
    [  140.938779]  __alloc_pages_slowpath.constprop.0+0x7cc/0x8a0
    [  140.939487]  __alloc_pages+0x1fe/0x230
    [  140.940188]  alloc_pages_vma+0x8f/0x2d0
    [  140.940886]  __read_swap_cache_async+0xfb/0x2c0
    [  140.941576]  swap_cluster_readahead+0x15c/0x2c0
    [  140.942275]  shmem_swapin+0xa7/0xf0
    [  140.942972]  shmem_swapin_page+0x1aa/0x380
    [  140.943655]  shmem_unuse_inode+0x44d/0x550
    [  140.944323]  shmem_unuse+0x8f/0x190
    [  140.944960]  try_to_unuse+0x69/0x410
    [  140.945583]  __do_sys_swapoff+0x1e3/0x6c0
    [  140.946202]  do_syscall_64+0x59/0x90
    [  140.946786]  ? syscall_exit_to_user_mode+0x12/0x30
    [  140.947381]  ? do_syscall_64+0x69/0x90
    [  140.947949]  ? __rseq_handle_notify_resume+0x26/0xc0
    [  140.948506]  ? exit_to_user_mode_loop+0xf6/0x160
    [  140.949053]  ? exit_to_user_mode_prepare+0xb6/0x100
    [  140.949587]  ? syscall_exit_to_user_mode+0x12/0x30
    [  140.950106]  ? do_syscall_64+0x69/0x90
    [  140.950608]  ? sysvec_apic_timer_interrupt+0x3c/0x90
    [  140.951105]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
    [  140.951615] RIP: 0033:0x7f6e37d61e1b
    [  140.952101] Code: 73 01 c3 48 8b 0d 05 b0 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 a8 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 af 1b 00 f7 d8 64 89 01 48
    [  140.953057] RSP: 002b:00007ffe92aeadc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a8
    [  140.953532] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6e37d61e1b
    [  140.954003] RDX: 0000000000000000 RSI: 00007ffe92aea390 RDI: 000055a7d1ab2820
    [  140.954446] RBP: 00007ffe92aeafd0 R08: 0000000000000000 R09: 00007ffe92aea1b0
    [  140.954901] R10: 00007ffe92aea370 R11: 0000000000000206 R12: 000055a7d1ab0da0
    [  140.955349] R13: 000055a7d1ab2820 R14: 000055a7d119307c R15: 0000000000000006
    
  6. Out of memory events:

    crash> log | grep 'Out of memory'
    [  139.587149] Out of memory: Killed process 6535 (swapoff) total-vm:6888kB, anon-rss:152kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
    [  140.007426] Out of memory: Killed process 6561 (systemd-udevd) total-vm:33192kB, anon-rss:760kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0
    [  140.931624] Out of memory and no killable processes...
    
  7. Memory isn't used by processes much when oom-killer is invoked:

    crash>  ps -G | sed 's/^>//' | awk '{ m[$9]+=$8 } END { for (item in m) { printf "%20s %10s KiB\n", item, m[item] } }' | sort -k 2 -r -n | head 
         systemd-shutdow      17604 KiB
          [zswap-shrink]          0 KiB
               [xprtiod]          0 KiB
       [xfs-reclaim/dm-]          0 KiB
         [xfs_mru_cache]          0 KiB
          [xfs-log/dm-0]          0 KiB
           [xfs-gc/dm-0]          0 KiB
         [xfs-conv/dm-0]          0 KiB
          [xfs-cil/dm-0]          0 KiB
          [xfs-buf/dm-0]          0 KiB
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments