Kernel panic due to "BUG: unable to handle kernel NULL pointer dereference at 0000000000000001" in get_align_mask().

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux

Issue

  • Kernel panic with below messages in the kernel ring buffer.
[147536.316873] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[147536.316947] PGD 0 P4D 0 
[147536.316974] Oops: 0002 [#1] SMP PTI
[147536.317004] CPU: 1 PID: 657219 Comm: sh Kdump: loaded Tainted: G                  --------r-  - 4.18.0-553.22.1.el8_10.x86_64 #1
[147536.317077] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[147536.317141] RIP: 0010:get_align_mask+0x3d/0x40
[147536.317178] Code: 8b 3c 25 40 dc 01 00 48 8b 17 b9 02 00 00 00 48 c1 ea 1d 83 e2 01 29 d1 85 f1 74 0c f6 47 26 40 48 0f 45
 05 ed 1b f9 01 c3 cc  cc cc 0f 1f 44 00 00 55 53 48 89 fb e8 b1 ff ff ff 48 8b 2d da
[147536.317287] RSP: 0018:ffffb5c01200fd38 EFLAGS: 00010207
[147536.317330] RAX: 0000000000000000 RBX: 00007ffe2398e000 RCX: 00007ffffffff000
[147536.317373] RDX: 0000000000092000 RSI: 00000000ffffffff RDI: 000000002398f000
[147536.317402] RBP: 0000000000000000 R08: 0000000080000000 R09: 00007f50c0869000
[147536.317429] R10: 0000000000000009 R11: 00007f50c0887ee0 R12: 0000000000000009
[147536.317456] R13: ffffb5c01200c000 R14: ffff8eec718f0c00 R15: ffff8eed9f519400
[147536.317483] FS:  0000000000000000(0000) GS:ffff8eefdfc40000(0000) knlGS:0000000000000000
[147536.317513] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[147536.317536] CR2: 0000000000000001 CR3: 00000001a8332005 CR4: 00000000001706e0
[147536.317597] Call Trace:
[147536.317660]  ? __die_body+0x1a/0x60
[147536.317688]  ? no_context+0x1ba/0x3f0
[147536.317708]  ? __bad_area_nosemaphore+0x157/0x180
[147536.317729]  ? do_page_fault+0x37/0x12d
[147536.317749]  ? page_fault+0x1e/0x30
[147536.317774]  ? get_align_mask+0x3d/0x40
[147536.317794]  align_vdso_addr+0x24/0x40
[147536.317814]  arch_setup_additional_pages+0x8e/0xf0
[147536.317837]  load_elf_binary+0xebf/0x1350
[147536.317860]  search_binary_handler+0x119/0x3a0
[147536.317883]  do_execveat_common.isra.38+0x5b7/0x980
[147536.317906]  __x64_sys_execve+0x32/0x40
[147536.317926]  do_syscall_64+0x5b/0x1a0
[147536.317947]  entry_SYSCALL_64_after_hwframe+0x66/0xcb
[147536.317971] RIP: 0033:0x7fa40aafc8db
[147536.317998] Code: Unable to access opcode bytes at RIP 0x7fa40aafc8b1.
[147536.318026] RSP: 002b:00007ffccd775a68 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[147536.318056] RAX: ffffffffffffffda RBX: 00007ffccd775a90 RCX: 00007fa40aafc8db
[147536.318083] RDX: 00005581bc47dcd0 RSI: 00007ffccd775a70 RDI: 00007fa40be62788
[147536.318110] RBP: 00007ffccd775af0 R08: 00007ffccd775b00 R09: 00007ffccd775a90
[147536.318136] R10: 0000000000000001 R11: 0000000000000202 R12: 00007fa40c093be8
[147536.318162] R13: 00007fa40be62d80 R14: 00007fa40be62788 R15: 00007ffccd775b10
[147536.318191] Modules linked in: mptcp_diag xsk_diag vsock_diag raw_diag unix_diag af_packet_diag netlink_diag udp_diag tcp_diag inet_diag nf_tables libcrc32c nfnetlink cfg80211 rfkill vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmwgfx drm_ttm_helper ttm intel_rapl_msr intel_rapl_common sb_edac drm_kms_helper crct10dif_pclmul crc32_pclmul syscopyarea ghash_clmulni_intel sysfillrect sysimgblt rapl vmw_balloon drm pcspkr joydev i2c_piix4 vmw_vmci binfmt_misc ext4 mbcache jbd2 sr_mod cdrom sd_mod t10_pi sg ata_generic ata_piix libata mptspi scsi_transport_spi mptscsih mptbase crc32c_intel serio_raw vmxnet3 dm_mod fuse
[147536.319871] Red Hat flags: eBPF/event eBPF/rawtrace
[147536.320335] CR2: 0000000000000001

Resolution

  • For guest systems:
    Contact hypervisor vendor to determine if the issue is related to an emulation problem. Additionally, assess the health of the underlying hardware on the hypervisor host.

  • For physical systems:
    Check for any faulty CPUs that may be contributing to the issue.
    Engage respective hardware vendor to identify & replace the faulty hardware by running complete hardware diagnostics.

Root Cause

  • The kernel panic occurred because the CPU executed an instruction at an unexpected location, which should not have been possible under normal conditions.

Diagnostic Steps

  • System Information
crash> sys | grep -i "RELEASE\|PANIC"
     RELEASE: 4.18.0-553.22.1.el8_10.x86_64
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000001"
  • Backtrace of the panic task.
crash> set -p
    PID: 657219
COMMAND: "sh"
   TASK: ffff8ee8d560a800  [THREAD_INFO: ffff8ee8d560a800]
    CPU: 1
  STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 657219   TASK: ffff8ee8d560a800  CPU: 1    COMMAND: "sh"
 #0 [ffffb5c01200fa60] machine_kexec at ffffffffb506f383
 #1 [ffffb5c01200fab8] __crash_kexec at ffffffffb51bacba
 #2 [ffffb5c01200fb78] crash_kexec at ffffffffb51bbbf1
 #3 [ffffb5c01200fb90] oops_end at ffffffffb502d771
 #4 [ffffb5c01200fbb0] no_context at ffffffffb5081d83
 #5 [ffffb5c01200fc08] __bad_area_nosemaphore at ffffffffb50820e7
 #6 [ffffb5c01200fc50] do_page_fault at ffffffffb5082da7
 #7 [ffffb5c01200fc80] page_fault at ffffffffb5c011fe
    [exception RIP: get_align_mask+0x3d]
    RIP: ffffffffb50300dd  RSP: ffffb5c01200fd38  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: 00007ffe2398e000  RCX: 00007ffffffff000
    RDX: 0000000000092000  RSI: 00000000ffffffff  RDI: 000000002398f000
    RBP: 0000000000000000   R8: 0000000080000000   R9: 00007f50c0869000
    R10: 0000000000000009  R11: 00007f50c0887ee0  R12: 0000000000000009
    R13: ffffb5c01200c000  R14: ffff8eec718f0c00  R15: ffff8eed9f519400
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffb5c01200fd38] align_vdso_addr at ffffffffb5030104
 #9 [ffffb5c01200fd50] arch_setup_additional_pages at ffffffffb50060be
#10 [ffffb5c01200fd68] load_elf_binary at ffffffffb53db99f
#11 [ffffb5c01200fe30] search_binary_handler at ffffffffb5376039
#12 [ffffb5c01200fe78] do_execveat_common at ffffffffb53772c7
#13 [ffffb5c01200ff18] __x64_sys_execve at ffffffffb5377802
#14 [ffffb5c01200ff38] do_syscall_64 at ffffffffb500549b
#15 [ffffb5c01200ff50] entry_SYSCALL_64_after_hwframe at ffffffffb5c0012e
    RIP: 00007fa40aafc8db  RSP: 00007ffccd775a68  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00007ffccd775a90  RCX: 00007fa40aafc8db
    RDX: 00005581bc47dcd0  RSI: 00007ffccd775a70  RDI: 00007fa40be62788
    RBP: 00007ffccd775af0   R8: 00007ffccd775b00   R9: 00007ffccd775a90
    R10: 0000000000000001  R11: 0000000000000202  R12: 00007fa40c093be8
    R13: 00007fa40be62d80  R14: 00007fa40be62788  R15: 00007ffccd775b10
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b

30 static unsigned long get_align_mask(void)
 31 {
 32         /* handle 32- and 64-bit case with a single conditional */
 33         if (va_align.flags < 0 || !(va_align.flags & (2 - mmap_is_ia32())))
 34                 return 0;
 35 
 36         if (!(current->flags & PF_RANDOMIZE))
 37                 return 0;
 38 
 39         return va_align.mask;
 40 }
  • Disassemble the function where kernel panic occurred.
crash> dis get_align_mask
0xffffffffb50300a0 <get_align_mask>:    nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffb50300a5 <get_align_mask+0x5>:        mov    0x1f91c15(%rip),%esi        # 0xffffffffb6fc1cc0 <va_align>
0xffffffffb50300ab <get_align_mask+0xb>:        xor    %eax,%eax
0xffffffffb50300ad <get_align_mask+0xd>:        test   %esi,%esi
0xffffffffb50300af <get_align_mask+0xf>:        js     0xffffffffb50300db <get_align_mask+0x3b>
0xffffffffb50300b1 <get_align_mask+0x11>:       mov    %gs:0x1dc40,%rdi
0xffffffffb50300ba <get_align_mask+0x1a>:       mov    (%rdi),%rdx
0xffffffffb50300bd <get_align_mask+0x1d>:       mov    $0x2,%ecx
0xffffffffb50300c2 <get_align_mask+0x22>:       shr    $0x1d,%rdx
0xffffffffb50300c6 <get_align_mask+0x26>:       and    $0x1,%edx
0xffffffffb50300c9 <get_align_mask+0x29>:       sub    %edx,%ecx
0xffffffffb50300cb <get_align_mask+0x2b>:       test   %esi,%ecx
0xffffffffb50300cd <get_align_mask+0x2d>:       je     0xffffffffb50300db <get_align_mask+0x3b>
0xffffffffb50300cf <get_align_mask+0x2f>:       testb  $0x40,0x26(%rdi)
0xffffffffb50300d3 <get_align_mask+0x33>:       cmovne 0x1f91bed(%rip),%rax        # 0xffffffffb6fc1cc8 <va_align+0x8>
0xffffffffb50300db <get_align_mask+0x3b>:       ret    
0xffffffffb50300dc <get_align_mask+0x3c>:       int3   
0xffffffffb50300dd <get_align_mask+0x3d>:       int3   
0xffffffffb50300de <get_align_mask+0x3e>:       int3   
0xffffffffb50300df <get_align_mask+0x3f>:       int3   

crash> struct va_alignment.flags va_align -d
  flags = -1,

Ideally, the jump should have landed at +0x3b to reach the ret instruction, but instead, execution ended up at +0x3d.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments