Kernel panic with "BUG: unable to handle page fault for address" in split_huge_pmd_locked()

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 9.4
    • 5.14.0-427.23.1.el9_4.x86_64

Issue

  • Kernel crash in _split_huge_pmd_locked() with a message "BUG: unable to handle page fault for address".

Resolution

  • Update the kernel to 5.14.0-570.12.1.el9_6 or later to resolve the issue.

Workaround

Root Cause

  • The crash occur in __split_huge_pmd_locked(), precisely the function mentioned in the ("mm: fix race between __split_huge_pmd_locked() and GUP-fast") as problematic. This describes issues with handling PMD entries that represent non-present migration entries.

  • PMD entries (0xd7ffe7ff6e2bfe02 and 0xd7ffe7dfc9dbfe02) have the present bit unset but are improperly processed as if they were valid.

Diagnostic Steps

crash> sys | grep -i "RELEASE\|PANIC"
     RELEASE: 5.14.0-427.23.1.el9_4.x86_64
       PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" (check log for details)

PID: 983      TASK: ffff9e96443dd640  CPU: 118  COMMAND: "kswapd7"
 #0 [ffffaa35dc3bf6c0] machine_kexec at ffffffff956781e7
 #1 [ffffaa35dc3bf718] __crash_kexec at ffffffff957ef73a
 #2 [ffffaa35dc3bf7d8] crash_kexec at ffffffff957f09c8
 #3 [ffffaa35dc3bf7e0] oops_end at ffffffff9562f9bb
 #4 [ffffaa35dc3bf800] page_fault_oops at ffffffff9568a5eb
 #5 [ffffaa35dc3bf858] exc_page_fault at ffffffff96284af8
 #6 [ffffaa35dc3bf880] asm_exc_page_fault at ffffffff96400bc2
    [exception RIP: __split_huge_pmd+0x10c]    <======
    RIP: ffffffff959f34fc  RSP: ffffaa35dc3bf930  RFLAGS: 00010286
    RAX: fffff2f6c0ce4000  RBX: 0000000000000001  RCX: 28001800339001fd
    RDX: 000fffffffffffff  RSI: 0000000000000000  RDI: fffff297c4e48e68
    RBP: fffff297c4e48e68   R8: fffff296d64f8000   R9: 000fffffffffffff
    R10: ffff9d1640000000  R11: 0000000000000000  R12: ffff9d5779239800
    R13: fffff296d64f8000  R14: ffff9d5f59d3a610  R15: fffff296d64f8000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffaa35dc3bf990] try_to_migrate_one at ffffffff9599a2e0
 #8 [ffffaa35dc3bfa58] rmap_walk_anon at ffffffff959970ee
 #9 [ffffaa35dc3bfaa8] try_to_migrate at ffffffff9599aab3
#10 [ffffaa35dc3bfae8] split_huge_page_to_list at ffffffff959f46c0
#11 [ffffaa35dc3bfb78] deferred_split_scan at ffffffff959f4b20
#12 [ffffaa35dc3bfbe0] do_shrink_slab at ffffffff95946186
#13 [ffffaa35dc3bfc40] shrink_slab_memcg at ffffffff9594e471
#14 [ffffaa35dc3bfcb0] shrink_one at ffffffff9594e78c
#15 [ffffaa35dc3bfcf0] shrink_many at ffffffff9595103f
#16 [ffffaa35dc3bfd48] shrink_node at ffffffff95951626
#17 [ffffaa35dc3bfdc8] balance_pgdat at ffffffff9595196e
#18 [ffffaa35dc3bfee0] kswapd at ffffffff95951ef2
#19 [ffffaa35dc3bff18] kthread at ffffffff957358cd
#20 [ffffaa35dc3bff50] ret_from_fork at ffffffff95602c69

crash> struct vm_area_struct.vm_mm ffff9d5f59d3a610
  vm_mm = 0xffff9e96536ddf00,

crash> mm_struct.owner 0xffff9e96536ddf00
    owner = 0xffff9e96459b8000,

crash> ps -m 0xffff9e96459b8000
[0 00:00:00.006] [UN]  PID: 100359   TASK: ffff9e96459b8000  CPU: 20   COMMAND: "q"

crash> set 100359
    PID: 100359
COMMAND: "q"
   TASK: ffff9e96459b8000  [THREAD_INFO: ffff9e96459b8000]
    CPU: 20
  STATE: TASK_UNINTERRUPTIBLE 

crash> bt
PID: 100359   TASK: ffff9e96459b8000  CPU: 20   COMMAND: "q"
 #0 [ffffaa3600b6b9f8] __schedule at ffffffff9628fb7b
 #1 [ffffaa3600b6ba60] schedule at ffffffff9629000d
 #2 [ffffaa3600b6ba70] schedule_preempt_disabled at ffffffff96290371
 #3 [ffffaa3600b6ba78] rwsem_down_read_slowpath at ffffffff96292d0f
 #4 [ffffaa3600b6bb10] down_read at ffffffff96292ed5
 #5 [ffffaa3600b6bb20] rmap_walk_anon at ffffffff9599727e
 #6 [ffffaa3600b6bb70] __unmap_and_move at ffffffff959e8ef0
 #7 [ffffaa3600b6bbe0] unmap_and_move at ffffffff959e9193
 #8 [ffffaa3600b6bc20] migrate_pages at ffffffff959e9961
 #9 [ffffaa3600b6bce8] migrate_misplaced_page at ffffffff959ea44b
#10 [ffffaa3600b6bd60] do_huge_pmd_numa_page at ffffffff959f1f35
#11 [ffffaa3600b6bdc0] __handle_mm_fault at ffffffff95984430
#12 [ffffaa3600b6bea0] handle_mm_fault at ffffffff9598453d
#13 [ffffaa3600b6bed8] do_user_addr_fault at ffffffff9568ac94
#14 [ffffaa3600b6bf28] exc_page_fault at ffffffff96284ab2
#15 [ffffaa3600b6bf50] asm_exc_page_fault at ffffffff96400bc2
    RIP: 000000000041f407  RSP: 00007ffda51f8ff0  RFLAGS: 00010202
    RAX: 072dd649a6d19916  RBX: 0000000001f5b9bd  RCX: 000000000104bbd6
    RDX: 0000000000000001  RSI: 0000000000000013  RDI: 0000000001f5b9bd
    RBP: 0000000000000013   R8: ece89f7e8104bbd6   R9: 00007f04a0057ec0
    R10: 000000008104bbd6  R11: 0000000000000000  R12: 0000000004000000
    R13: 0000000003909472  R14: 0000000000000013  R15: 00007f0497dfa010
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

crash> vtop -u 0x00007f04a01fd000 -c 100359
VIRTUAL     PHYSICAL        
7f04a01fd000  (not mapped)

   PGD: 181417e47f0 => 181659b7067
   PUD: 181659b7090 => 4139239067
   PMD: 4139239800 => d7ffe7ffcc6ffe02  <==

      VMA           START       END     FLAGS FILE
ffff9d5f59d3a610 7f0497dfa000 7f04d7dfb000 8100073 

crash> struct vm_fault.address ffffaa3600b6bde0
    address = 0x7f04a0057000,

crash> vtop -u 0x7f04a0057000 -c 100359
VIRTUAL     PHYSICAL        
7f04a0057000  (not mapped)

   PGD: 181417e47f0 => 181659b7067
   PUD: 181659b7090 => 4139239067
   PMD: 4139239800 => d7ffe7ffcc6ffe02  <==

      VMA           START       END     FLAGS FILE
ffff9d5f59d3a610 7f0497dfa000 7f04d7dfb000 8100073 

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  528301742    2015.3 GB         ----
         FREE  11224572      42.8 GB    2% of TOTAL MEM
         USED  517077170    1972.5 GB   97% of TOTAL MEM
       SHARED  2018271       7.7 GB    0% of TOTAL MEM
      BUFFERS    21466      83.9 MB    0% of TOTAL MEM
       CACHED  258287326     985.3 GB   48% of TOTAL MEM
         SLAB  7274992      27.8 GB    1% of TOTAL MEM

   TOTAL HUGE   614400       2.3 GB         ----
    HUGE FREE   614400       2.3 GB  100% of TOTAL HUGE  <==

   TOTAL SWAP   262143      1024 MB         ----
    SWAP USED   262135      1024 MB   99% of TOTAL SWAP
    SWAP FREE        8        32 KB    0% of TOTAL SWAP

 COMMIT LIMIT  264105814    1007.5 GB         ----
    COMMITTED  416565423    1589.1 GB  157% of TOTAL LIMIT

crash> px boot_cpu_data | grep -e x86_model_id -e x86_phys_bits -e microcode
  x86_phys_bits = 0x30,
  x86_model_id = "AMD EPYC 7742 64-Core Processor\000             <== \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  microcode = 0x830107b,
  • Another vmcore:
PID: 979      TASK: ffff8f2544659cc0  CPU: 28   COMMAND: "kswapd1"
        ...
    [exception RIP: __split_huge_pmd+268]
    RIP: ffffffffa6bf34fc  RSP: ffffa65bdc39f930  RFLAGS: 00010286
    RAX: ffffc9b580d89000  RBX: 0000000000000001  RCX: 28001820362401fd
    RDX: 000fffffffffffff  RSI: 0000000000000000  RDI: ffffc957044a31e8
    RBP: ffffc957044a31e8   R8: ffffc95717740000   R9: 000fffffffffffff
    R10: ffff8d2540000000  R11: 0000000000000000  R12: ffff8da6528c7590
    R13: ffffc95717740000  R14: ffff8da74f7e4cb0  R15: ffffc95717740000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa65bdc39f990] try_to_migrate_one at ffffffffa6b9a2e0
 #8 [ffffa65bdc39fa58] rmap_walk_anon at ffffffffa6b970ee
 #9 [ffffa65bdc39faa8] try_to_migrate at ffffffffa6b9aab3
#10 [ffffa65bdc39fae8] split_huge_page_to_list at ffffffffa6bf46c0
#11 [ffffa65bdc39fb78] deferred_split_scan at ffffffffa6bf4b20
#12 [ffffa65bdc39fbe0] do_shrink_slab at ffffffffa6b46186
#13 [ffffa65bdc39fc40] shrink_slab_memcg at ffffffffa6b4e471
#14 [ffffa65bdc39fcb0] shrink_one at ffffffffa6b4e78c
#15 [ffffa65bdc39fcf0] shrink_many at ffffffffa6b5103f
#16 [ffffa65bdc39fd48] shrink_node at ffffffffa6b51626
#17 [ffffa65bdc39fdc8] balance_pgdat at ffffffffa6b5196e
#18 [ffffa65bdc39fee0] kswapd at ffffffffa6b51ef2
#19 [ffffa65bdc39ff18] kthread at ffffffffa69358cd
#20 [ffffa65bdc39ff50] ret_from_fork at ffffffffa6802c69

The vma:

crash> kmem ffff8da74f7e4cb0
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8d264004a440      232     263878    295260   4218    16k  vm_area_struct
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffc957083df900  ffff8da74f7e4000     2     70         67     3
  FREE / [ALLOCATED]
  [ffff8da74f7e4cb0]

crash> vm_area_struct.vm_start,vm_end ffff8da74f7e4cb0 -x
  vm_start = 0x7f71c3dfa000,
  vm_end = 0x7f7203dfb000,

crash> pmd_t ffff8da6528c7590 -x
struct pmd_t {
  pmd = 0xd7ffe7dfc9dbfe02
}

The address:

crash> vtop -u 00007f71d6400000
VIRTUAL     PHYSICAL        
7f71d6400000  (not accessible)

crash> epython decode-pte -x
present: False
write: True
user: False
pwt: False
pcd: False
accessed: False
dirty: False
hugepage: False
global: False
frame: 15564413921466773504

crash> px 15564413921466773504
$1 = 0xd7ffe7ff6e2bf000

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments