Kernel panic with "BUG: unable to handle kernel NULL pointer dereference at 00000000000002d4"
Environment
- Red Hat Enterprise Linux 7
- 3rd party module
vxdmp
loaded
Issue
- The kernel panics with panic string
BUG: unable to handle kernel NULL pointer dereference at 00000000000002d4
at RIPbdevname+0x1a
Resolution
- Red Hat neither ships nor supports this module. Engage the respective vendor of the module
vxdmp
for further investigation. - As a workaround, if the system does not depend on storage devices provided through
vxdmp
, you may disable it temporarily. For more information on how to prevent a module from loading on boot, please refer to the following knowledge base article:
How do I prevent a kernel module from loading automatically?
Root Cause
The system panicked because of 3rd party vxdmp
module passed an invalid data from its get_dip_from_device()
function.
Diagnostic Steps
Pre-requisites
-
Deploy kdump in Order to Collect a vmcore:
- Vmcore analyis is required to determine if you are being impacted by this issue. This first requires that a vmcore is dumped successfully.
- If the
kexec-tools
package is absent or thekdump
service is inactive, please reference the following article to install, enable, start, and configure kdump:
How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
-
Prepare crash Environment for vmcore Analysis:
- Please reference the following article to set up a vmcore analysis environment:
How to set up a vmcore analysis environment?
- Please reference the following article to set up a vmcore analysis environment:
Vmcore Analysis
Example: 1
-
System Information:
crash> sys |grep -eREL -ePAN -eLOAD LOAD AVERAGE: 1.65, 1.27, 1.16 RELEASE: 3.10.0-1160.88.1.el7.x86_64 PANIC: "BUG: unable to handle kernel NULL pointer dereference at 00000000000002d4" crash> sys -i | head -5 DMI_BIOS_VENDOR: HP DMI_BIOS_VERSION: P70 DMI_BIOS_DATE: 05/24/2019 DMI_SYS_VENDOR: HP DMI_PRODUCT_NAME: ProLiant DL380p Gen8
-
The backtrace of the panicking task shows the function where the panic occurred as indicated by the RIP. Here the panic occurred in
bdevname
. Thevxdmp
module passed an invalid data from itsget_dip_from_device()
function:crash> bt PID: 6610 TASK: ffff935d306d2100 CPU: 3 COMMAND: "dmpdaemon" #0 [ffff935d3091f9a0] machine_kexec at ffffffffac869514 #1 [ffff935d3091fa00] __crash_kexec at ffffffffac929e82 #2 [ffff935d3091fad0] crash_kexec at ffffffffac929f78 #3 [ffff935d3091fae8] oops_end at ffffffffacfbc818 #4 [ffff935d3091fb10] no_context at ffffffffac87974c #5 [ffff935d3091fb60] __bad_area_nosemaphore at ffffffffac879a2a #6 [ffff935d3091fbb0] bad_area_nosemaphore at ffffffffac879b54 #7 [ffff935d3091fbc0] __do_page_fault at ffffffffacfbf8d0 #8 [ffff935d3091fc30] do_page_fault at ffffffffacfbfb05 #9 [ffff935d3091fc60] page_fault at ffffffffacfbb7b8 [exception RIP: bdevname+0x1a] RIP: ffffffffacb8216a RSP: ffff935d3091fd18 RFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff93534d83e700 RCX: 0000000000000003 RDX: ffff93534d83e700 RSI: ffff93534d83e700 RDI: ffff935533028400 RBP: ffff935d3091fd18 R8: fdb5d31615bcf001 R9: ffffffffc09f58e7 R10: ffff934ebfc03b00 R11: ffffcf0258360f80 R12: 0000000000800070 R13: 0000000000000000 R14: ffffffffc0715700 R15: ffff935d3076fc00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff935d3091fd20] get_dip_from_device at ffffffffc09f67bc [vxdmp] #11 [ffff935d3091fd50] dmp_node_to_dip at ffffffffc09f6820 [vxdmp] #12 [ffff935d3091fd60] dmp_check_nonscsi at ffffffffc0a30459 [vxdmp] #13 [ffff935d3091fd88] dmp_check_path_alive at ffffffffc0a30fcb [vxdmp] #14 [ffff935d3091fdc8] dmp_check_disabled_policy at ffffffffc0a3164a [vxdmp] #15 [ffff935d3091fe60] dmp_initiate_restore at ffffffffc0a31b13 [vxdmp] #16 [ffff935d3091fe90] dmp_daemons_loop at ffffffffc0a3fc4c [vxdmp] #17 [ffff935d3091fec8] kthread at ffffffffac8cb621 crash> dis -r bdevname+0x1a 0xffffffffacb82150 <bdevname>: data16 data16 data16 xchg %ax,%ax 0xffffffffacb82155 <bdevname+0x5>: push %rbp 0xffffffffacb82156 <bdevname+0x6>: mov 0x88(%rdi),%rax 0xffffffffacb8215d <bdevname+0xd>: mov %rsi,%rdx 0xffffffffacb82160 <bdevname+0x10>: mov 0x98(%rdi),%rdi 0xffffffffacb82167 <bdevname+0x17>: mov %rsp,%rbp 0xffffffffacb8216a <bdevname+0x1a>: mov 0x2d4(%rax),%esi
-
Third party module
vxdmp
loaded on the server:crash> mod -t | grep vxdmp vxdmp POE
Example: 2
This example shows %rdi
being passed to bdevname()
and the value of %rdi
is invalid. bdevname()
was called by get_dip_from_device()
with an invalid address in %rdi
-
Backtrace of the panic task:
crash> bt PID: 4639 TASK: ffff99985bc11080 CPU: 2 COMMAND: "dmpdaemon" #0 [ffff999022aef9d0] machine_kexec at ffffffffbb0663d4 #1 [ffff999022aefa30] __crash_kexec at ffffffffbb122ae2 #2 [ffff999022aefb00] crash_kexec at ffffffffbb122bd0 #3 [ffff999022aefb18] oops_end at ffffffffbb791798 #4 [ffff999022aefb40] no_context at ffffffffbb075d14 #5 [ffff999022aefb90] __bad_area_nosemaphore at ffffffffbb075fe2 #6 [ffff999022aefbe0] bad_area_nosemaphore at ffffffffbb076104 #7 [ffff999022aefbf0] __do_page_fault at ffffffffbb794750 #8 [ffff999022aefc60] do_page_fault at ffffffffbb794975 #9 [ffff999022aefc90] page_fault at ffffffffbb790778 [exception RIP: bdevname+26] RIP: ffffffffbb36cdba RSP: ffff999022aefd48 RFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff998ff4ed4680 RCX: ffff999022aeffd8 RDX: ffff998ff4ed4680 RSI: ffff998ff4ed4680 RDI: ffff9998595d5c00 RBP: ffff999022aefd48 R8: fdf48b7e745a5004 R9: ffffffffc1051897 R10: ffff9989bfc03b00 R11: ffffe516ded3b500 R12: 00000000041000e0 R13: ffff99905c9f1380 R14: ffff998ff4ed48c0 R15: ffff999054a50400 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff999022aefd50] get_dip_from_device at ffffffffc1052752 [vxdmp] #11 [ffff999022aefd78] dmp_node_to_dip at ffffffffc10527b0 [vxdmp] #12 [ffff999022aefd88] dmp_check_nonscsi at ffffffffc108c389 [vxdmp] #13 [ffff999022aefdb0] dmp_probe_required at ffffffffc108c401 [vxdmp] #14 [ffff999022aefdc8] dmp_check_disabled_policy at ffffffffc108d390 [vxdmp] #15 [ffff999022aefe60] dmp_initiate_restore at ffffffffc108da43 [vxdmp] #16 [ffff999022aefe90] dmp_daemons_loop at ffffffffc109bb5c [vxdmp] #17 [ffff999022aefec8] kthread at ffffffffbb0c5f91
-
The disassembly of the function shows that the kernel got panic while executing the below instruction:
crash> dis -rl ffffffffbb36cdba /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 47 0xffffffffbb36cda0 <bdevname>: data16 data16 data16 xchg %ax,%ax 0xffffffffbb36cda5 <bdevname+5>: push %rbp /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 48 0xffffffffbb36cda6 <bdevname+6>: mov 0x88(%rdi),%rax /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 47 0xffffffffbb36cdad <bdevname+13>: mov %rsi,%rdx /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 48 0xffffffffbb36cdb0 <bdevname+16>: mov 0x98(%rdi),%rdi /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 47 0xffffffffbb36cdb7 <bdevname+23>: mov %rsp,%rbp /usr/src/debug/kernel-3.10.0-1160.76.1.el7/linux-3.10.0-1160.76.1.el7.x86_64/block/partition-generic.c: 48 0xffffffffbb36cdba <bdevname+26>: mov 0x2d4(%rax),%esi <<<----
As the value of rax register appear as NULL inside bdevname():
crash> bt | grep RAX RAX: 0000000000000000 RBX: ffff998ff4ed4680 RCX: ffff999022aeffd8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 crash> px (0x2d4+0x0000000000000000) $1 = 0x2d4
The rax register is being modified by the rdi register inside the bdevname() function:
crash> dis -r ffffffffbb36cdba 0xffffffffbb36cda0 <bdevname>: data16 data16 data16 xchg %ax,%ax 0xffffffffbb36cda5 <bdevname+5>: push %rbp 0xffffffffbb36cda6 <bdevname+6>: mov 0x88(%rdi),%rax <<<---- 0xffffffffbb36cdad <bdevname+13>: mov %rsi,%rdx 0xffffffffbb36cdb0 <bdevname+16>: mov 0x98(%rdi),%rdi 0xffffffffbb36cdb7 <bdevname+23>: mov %rsp,%rbp 0xffffffffbb36cdba <bdevname+26>: mov 0x2d4(%rax),%esi**
While execution of
get_dip_from_device()
function prior to panic, the value of %rax moved to %r13, and the %r13 and %rax register values are not being modified. Hence %rax is equal to %rdi and equals to %r13:crash> dis -r ffffffffc1052752 | tail 0xffffffffc1052730 <get_dip_from_device+48>: call 0xffffffffbb28ec10 <bdget> 0xffffffffc1052735 <get_dip_from_device+53>: test %rax,%rax 0xffffffffc1052738 <get_dip_from_device+56>: mov %rax,%r13 <<<---- rax = r13 0xffffffffc105273b <get_dip_from_device+59>: je 0xffffffffc1052780 <get_dip_from_device+128> 0xffffffffc105273d <get_dip_from_device+61>: cmpq $0x0,0x98(%rax) 0xffffffffc1052745 <get_dip_from_device+69>: je 0xffffffffc1052780 <get_dip_from_device+128> 0xffffffffc1052747 <get_dip_from_device+71>: mov %rbx,%rsi 0xffffffffc105274a <get_dip_from_device+74>: mov %rax,%rdi <<<------- before calling bdevname() the rdi value being modified by rax i.e here rax = rdi = r13 0xffffffffc105274d <get_dip_from_device+77>: call 0xffffffffbb36cda0 <bdevname> 0xffffffffc1052752 <get_dip_from_device+82>: mov %r13,%rdi
The value of R13 = ffff99905c9f1380 = RAX = RDI :
crash> bt | grep R13 R13: ffff99905c9f1380 R14: ffff998ff4ed48c0 R15: ffff999054a50400
Before calling bdevname(), The value of %rax = ffff99905c9f1380 = %rdi , and this value passed to bdevname() seems invalid address which needs to be verified by module vendor:
crash> dis -r ffffffffbb36cdba 0xffffffffbb36cda0 <bdevname>: data16 data16 data16 xchg %ax,%ax 0xffffffffbb36cda5 <bdevname+5>: push %rbp 0xffffffffbb36cda6 <bdevname+6>: mov 0x88(%rdi),%rax /* 0x0000000000000000 */ 0xffffffffbb36cdad <bdevname+13>: mov %rsi,%rdx 0xffffffffbb36cdb0 <bdevname+16>: mov 0x98(%rdi),%rdi 0xffffffffbb36cdb7 <bdevname+23>: mov %rsp,%rbp 0xffffffffbb36cdba <bdevname+26>: mov 0x2d4(%rax),%esi /* 0x00000000000002d4 */ crash> px (0x88+0xffff99905c9f1380) $3 = 0xffff99905c9f1408 crash> rd 0xffff99905c9f1408 ffff99905c9f1408: 0000000000000000 crash> px (0x2d4+0x0000000000000000) $4 = 0x2d4
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments