Server panicked in vdd_irq_handler function because of null pointer.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • Third-Party Module [veloce_driver]

Issue

  • The server crashes with the following messages.

[Sun May 21 03:23:57 EDT 2023] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [Sun May 21 03:23:57 EDT 2023] IP: [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver] [Sun May 21 03:23:57 EDT 2023] PGD 0 [Sun May 21 03:23:57 EDT 2023] Oops: 0000 [#1] SMP .. [Sun May 21 03:23:57 EDT 2023] sysimgblt fb_sys_fops i40e ttm tg3 crct10dif_pclmul drm crct10dif_common ptp smartpqi crc32c_intel drm_panel_orien tation_quirks pps_core scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: vboxdrv] [Sun May 21 03:23:57 EDT 2023] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G W OE ------------ 3.10.0-1160.el7.x86_64 #1 [Sun May 21 03:23:57 EDT 2023] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/08/2021 [Sun May 21 03:23:57 EDT 2023] task: ffffffff9cc18480 ti: ffffffff9cc00000 task.ti: ffffffff9cc00000 [Sun May 21 03:23:57 EDT 2023] RIP: 0010:[<ffffffffc0d1c733>] [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver] [Sun May 21 03:23:57 EDT 2023] RSP: 0018:ffff97abbe603e08 EFLAGS: 00010046 [Sun May 21 03:23:57 EDT 2023] RAX: 0000000000000086 RBX: ffff982de7356800 RCX: ffff972efbb7c300 [Sun May 21 03:23:57 EDT 2023] RDX: 0000000000000001 RSI: ffff982de7356800 RDI: ffff982de7354a8c [Sun May 21 03:23:57 EDT 2023] RBP: ffff97abbe603ea8 R08: 00000000ffff982d R09: ffffffff9cc03eb0 [Sun May 21 03:23:57 EDT 2023] R10: ffff982de7356818 R11: 0000000000000246 R12: 0000000000000001 [Sun May 21 03:23:57 EDT 2023] R13: 0000000000000000 R14: ffff982de7354a00 R15: 0000000000000002 [Sun May 21 03:23:57 EDT 2023] FS: 0000000000000000(0000) GS:ffff97abbe600000(0000) knlGS:0000000000000000 [Sun May 21 03:23:57 EDT 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Sun May 21 03:23:57 EDT 2023] CR2: 0000000000000018 CR3: 0000017807a10000 CR4: 0000000000340ff0 [Sun May 21 03:23:57 EDT 2023] Call Trace: [Sun May 21 03:23:57 EDT 2023] <IRQ> [Sun May 21 03:23:57 EDT 2023] [<ffffffffc0480c96>] ? i40e_napi_poll+0x5a6/0x7a0 [i40e] [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c14fe54>] __handle_irq_event_percpu+0x44/0x1c0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c150002>] handle_irq_event_percpu+0x32/0x80 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c15008c>] handle_irq_event+0x3c/0x60 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c15372d>] handle_fasteoi_irq+0x5d/0x110 [Sun May 21 03:23:57 EDT 2023] veloce_driver INFO: EP[52]: Allocate & Configure Device [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c02f5f4>] handle_irq+0xe4/0x1a0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c0de2ce>] ? sched_clock_idle_wakeup_event+0x1e/0x20 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c110e7d>] ? tick_check_idle+0xbd/0xd0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c79892d>] do_IRQ+0x4d/0xf0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c78a36a>] common_interrupt+0x16a/0x16a [Sun May 21 03:23:57 EDT 2023] <EOI> [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c789000>] ? __cpuidle_text_start+0x8/0x8 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c78924b>] ? native_safe_halt+0xb/0x20 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c78901e>] default_idle+0x1e/0xc0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c037ca0>] arch_cpu_idle+0x20/0xc0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c1011ea>] cpu_startup_entry+0x14a/0x1e0 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c76f9c7>] rest_init+0x77/0x80 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9cd8b1cf>] start_kernel+0x44b/0x46c [Sun May 21 03:23:57 EDT 2023] [<ffffffff9cd8ab84>] ? repair_env_string+0x5c/0x5c [Sun May 21 03:23:57 EDT 2023] [<ffffffff9cd8a120>] ? early_idt_handler_array+0x120/0x120 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9cd8a738>] x86_64_start_reservations+0x24/0x26 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9cd8a88e>] x86_64_start_kernel+0x154/0x177 [Sun May 21 03:23:57 EDT 2023] [<ffffffff9c0000d5>] start_cpu+0x5/0x14 [Sun May 21 03:23:57 EDT 2023] Code: e2 a5 db e9 d9 fc ff ff 49 8d 86 8c 00 00 00 4c 89 55 88 48 89 c7 48 89 45 98 e8 b9 d3 a6 db 48 89 45 90 4d 8b 6e 28 4c 8b 55 88 <49> 8b 45 18 8b 48 04 83 f9 17 0f 86 2d 03 00 00 41 3b 8e 34 01 [Sun May 21 03:23:57 EDT 2023] RIP [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver] [Sun May 21 03:23:57 EDT 2023] RSP <ffff97abbe603e08> [Sun May 21 03:23:57 EDT 2023] CR2: 0000000000000018

Resolution

  • The [veloce_driver] module is not shipped by Red Hat.
  • Engage [veloce_driver] module vendor for further troubleshooting and diagnosis.

Diagnostic Steps

  • Crashed due to [veloce_driver] module function as shown below

System Information:

       CPUS: 96
        DATE: Sun May 21 03:23:57 EDT 2023
      UPTIME: 02:38:33
LOAD AVERAGE: 2.18, 1.43, 1.19
       TASKS: 1219
    NODENAME: local-hostname
     RELEASE: 3.10.0-1160.el7.x86_64
     VERSION: #1 SMP Tue Aug 18 14:50:17 EDT 2020
     MACHINE: x86_64  (2794 Mhz)
      MEMORY: 1023.8 GB
       PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000018"
         PID: 0
     COMMAND: "swapper/0"
        TASK: ffffffff9cc18480  (1 of 96)  [THREAD_INFO: ffffffff9cc00000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

Kernel Ring Buffer:

crash> log
..
[Sun May 21 03:23:57 EDT 2023] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018   <<<----
[Sun May 21 03:23:57 EDT 2023] IP: [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver]    <<----
[Sun May 21 03:23:57 EDT 2023] PGD 0 
[Sun May 21 03:23:57 EDT 2023] Oops: 0000 [#1] SMP 
..
[Sun May 21 03:23:57 EDT 2023]  sysimgblt fb_sys_fops i40e ttm tg3 crct10dif_pclmul drm crct10dif_common ptp smartpqi crc32c_intel drm_panel_orien
tation_quirks pps_core scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: vboxdrv]
[Sun May 21 03:23:57 EDT 2023] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-1160.el7.x86_64 #1
[Sun May 21 03:23:57 EDT 2023] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 06/08/2021
[Sun May 21 03:23:57 EDT 2023] task: ffffffff9cc18480 ti: ffffffff9cc00000 task.ti: ffffffff9cc00000
[Sun May 21 03:23:57 EDT 2023] RIP: 0010:[<ffffffffc0d1c733>]  [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver]
[Sun May 21 03:23:57 EDT 2023] RSP: 0018:ffff97abbe603e08  EFLAGS: 00010046
[Sun May 21 03:23:57 EDT 2023] RAX: 0000000000000086 RBX: ffff982de7356800 RCX: ffff972efbb7c300
[Sun May 21 03:23:57 EDT 2023] RDX: 0000000000000001 RSI: ffff982de7356800 RDI: ffff982de7354a8c
[Sun May 21 03:23:57 EDT 2023] RBP: ffff97abbe603ea8 R08: 00000000ffff982d R09: ffffffff9cc03eb0
[Sun May 21 03:23:57 EDT 2023] R10: ffff982de7356818 R11: 0000000000000246 R12: 0000000000000001
[Sun May 21 03:23:57 EDT 2023] R13: 0000000000000000 R14: ffff982de7354a00 R15: 0000000000000002
[Sun May 21 03:23:57 EDT 2023] FS:  0000000000000000(0000) GS:ffff97abbe600000(0000) knlGS:0000000000000000
[Sun May 21 03:23:57 EDT 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun May 21 03:23:57 EDT 2023] CR2: 0000000000000018 CR3: 0000017807a10000 CR4: 0000000000340ff0
[Sun May 21 03:23:57 EDT 2023] Call Trace:
[Sun May 21 03:23:57 EDT 2023]  <IRQ> 
[Sun May 21 03:23:57 EDT 2023]  [<ffffffffc0480c96>] ? i40e_napi_poll+0x5a6/0x7a0 [i40e]
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c14fe54>] __handle_irq_event_percpu+0x44/0x1c0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c150002>] handle_irq_event_percpu+0x32/0x80
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c15008c>] handle_irq_event+0x3c/0x60
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c15372d>] handle_fasteoi_irq+0x5d/0x110
[Sun May 21 03:23:57 EDT 2023] veloce_driver INFO: EP[52]: Allocate & Configure Device
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c02f5f4>] handle_irq+0xe4/0x1a0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c0de2ce>] ? sched_clock_idle_wakeup_event+0x1e/0x20
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c110e7d>] ? tick_check_idle+0xbd/0xd0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c79892d>] do_IRQ+0x4d/0xf0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c78a36a>] common_interrupt+0x16a/0x16a
[Sun May 21 03:23:57 EDT 2023]  <EOI> 
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c789000>] ? __cpuidle_text_start+0x8/0x8
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c78924b>] ? native_safe_halt+0xb/0x20
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c78901e>] default_idle+0x1e/0xc0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c037ca0>] arch_cpu_idle+0x20/0xc0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c1011ea>] cpu_startup_entry+0x14a/0x1e0
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c76f9c7>] rest_init+0x77/0x80
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9cd8b1cf>] start_kernel+0x44b/0x46c
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9cd8ab84>] ? repair_env_string+0x5c/0x5c
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9cd8a120>] ? early_idt_handler_array+0x120/0x120
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9cd8a738>] x86_64_start_reservations+0x24/0x26
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9cd8a88e>] x86_64_start_kernel+0x154/0x177
[Sun May 21 03:23:57 EDT 2023]  [<ffffffff9c0000d5>] start_cpu+0x5/0x14
[Sun May 21 03:23:57 EDT 2023] Code: e2 a5 db e9 d9 fc ff ff 49 8d 86 8c 00 00 00 4c 89 55 88 48 89 c7 48 89 45 98 e8 b9 d3 a6 db 48 89 45 90 4d 8b 6e 28 4c 8b 55 88 <49> 8b 45 18 8b 48 04 83 f9 17 0f 86 2d 03 00 00 41 3b 8e 34 01 
[Sun May 21 03:23:57 EDT 2023] RIP  [<ffffffffc0d1c733>] vdd_irq_handler+0x4a3/0x8a0 [veloce_driver]
[Sun May 21 03:23:57 EDT 2023]  RSP <ffff97abbe603e08>
[Sun May 21 03:23:57 EDT 2023] CR2: 0000000000000018

Backtrace of panic task:

crash> bt 
PID: 0        TASK: ffffffff9cc18480  CPU: 0    COMMAND: "swapper/0"
 #0 [ffff97abbe603a90] machine_kexec at ffffffff9c066294
 #1 [ffff97abbe603af0] __crash_kexec at ffffffff9c122562
 #2 [ffff97abbe603bc0] crash_kexec at ffffffff9c122650
 #3 [ffff97abbe603bd8] oops_end at ffffffff9c78b798
 #4 [ffff97abbe603c00] no_context at ffffffff9c075d14
 #5 [ffff97abbe603c50] __bad_area_nosemaphore at ffffffff9c075fe2
 #6 [ffff97abbe603ca0] bad_area_nosemaphore at ffffffff9c076104
 #7 [ffff97abbe603cb0] __do_page_fault at ffffffff9c78e750
 #8 [ffff97abbe603d20] do_page_fault at ffffffff9c78e975
 #9 [ffff97abbe603d50] page_fault at ffffffff9c78a778
    [exception RIP: vdd_irq_handler+1187]                      <<<------
    RIP: ffffffffc0d1c733  RSP: ffff97abbe603e08  RFLAGS: 00010046
    RAX: 0000000000000086  RBX: ffff982de7356800  RCX: ffff972efbb7c300
    RDX: 0000000000000001  RSI: ffff982de7356800  RDI: ffff982de7354a8c
    RBP: ffff97abbe603ea8   R8: 00000000ffff982d   R9: ffffffff9cc03eb0
    R10: ffff982de7356818  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000000  R14: ffff982de7354a00  R15: 0000000000000002
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff97abbe603e00] vdd_irq_handler at ffffffffc0d1c727 [veloce_driver]    <<<----
#11 [ffff97abbe603e08] i40e_napi_poll at ffffffffc0480c96 [i40e]
#12 [ffff97abbe603eb0] __handle_irq_event_percpu at ffffffff9c14fe54
#13 [ffff97abbe603ef8] handle_irq_event_percpu at ffffffff9c150002
#14 [ffff97abbe603f28] handle_irq_event at ffffffff9c15008c
#15 [ffff97abbe603f50] handle_fasteoi_irq at ffffffff9c15372d
#16 [ffff97abbe603f70] handle_irq at ffffffff9c02f5f4
#17 [ffff97abbe603fb8] do_IRQ at ffffffff9c79892d
--- <IRQ stack> ---
#18 [ffffffff9cc03e08] ret_from_intr at ffffffff9c78a36a
    [exception RIP: native_safe_halt+11]
    RIP: ffffffff9c78924b  RSP: ffffffff9cc03eb0  RFLAGS: 00000246
    RAX: ffffffff9c789000  RBX: 0000000000000000  RCX: 0100000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000046
    RBP: ffffffff9cc03eb0   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000246  R12: ffff972e93f50640
    R13: 0000000000000000  R14: ffff97abbe61acc0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffde  CS: 0010  SS: 0018
#19 [ffffffff9cc03eb8] default_idle at ffffffff9c78901e
#20 [ffffffff9cc03ed8] arch_cpu_idle at ffffffff9c037ca0
#21 [ffffffff9cc03ee8] cpu_startup_entry at ffffffff9c1011ea
#22 [ffffffff9cc03f30] rest_init at ffffffff9c76f9c7
#23 [ffffffff9cc03f40] start_kernel at ffffffff9cd8b1cf
#24 [ffffffff9cc03f88] x86_64_start_reservations at ffffffff9cd8a738
#25 [ffffffff9cc03f98] x86_64_start_kernel at ffffffff9cd8a88e
#26 [ffffffff9cc03ff0] start_cpu at ffffffff9c0000d5

Dis-assembly crashed function:

crash> sym vdd_irq_handler
ffffffffc0d1c290 (t) vdd_irq_handler [veloce_driver]   <<-----

Third-party Modules:

crash> mod -t
NAME           TAINTS
vboxnetflt     OE
vboxpci        OE
vboxnetadp     OE
vboxdrv        OE
veloce_driver  OE    <<<-----

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments