Kernel Panic : “Unable to handle kernel paging request at 0000000000200200" with RIP in list_del+0xb/0x6b

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.X
  • VMware

Issue

  • Server reboot with Panic message "Unable to handle kernel paging request at 0000000000XXXXXX RIP: ".
  • RIP in function list_del+0xb/0x6b called from vmxnet3_rq_destroy_all function of vmxnet3 module.

Resolution

An exception occurred in unsigned vmxnet3 kernel module. In RHEL 5.X, vmxnet3 module is provided by VMware. Contact the VMware technical team for further investigation.

Root Cause

The issue is because of dereferencing of an invalid address in RIP list_del() by the function vmxnet3_rq_destroy_all() which is associated with the module vmxnet3.

Diagnostic Steps

  • Analysis of vmcore

    crash> sys | grep  -e RELEASE -e PANIC
    RELEASE: 2.6.18-426.el5
    PANIC: "Unable to handle kernel paging request at 0000000000200200"
    
    crash> sys -i |head -n 5
    dmi_ident[1]: Phoenix Technologies LTD
    dmi_ident[2]: 6.00
    dmi_ident[3]: 09/21/2015
    dmi_ident[4]: VMware, Inc.
    dmi_ident[5]: VMware Virtual Platform
    
  • Kernel ring buffer

    Unable to handle kernel paging request at 0000000000200200 RIP: 
    [<ffffffff80162d7a>] list_del+0xb/0x6b                               <<<<<<<
    PGD 8000000383e03067 PUD 5a3573067 PMD 0 
    Oops: 0000 [1] SMP 
    last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
    CPU 1 
    Modules linked in: tcp_diag inet_diag nfs nfs_acl netconsole lockd sunrpc be2iscsi    ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i libcxgbi cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi vsock(U) vmmemctl(U) acpiphp dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac lp pvscsi(U) sg pcspkr ide_cd serio_raw tpm_tis i2c_piix4 parport_pc floppy parport tpm cdrom tpm_bios i2c_core vmci(U) vmxnet3(U) dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci ata_piix libata shpchp mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
    Pid: 51, comm: events/1 Tainted: G     --------------------    2.6.18-426.el5 #1
    RIP: 0010:[<ffffffff80162d7a>]  [<ffffffff80162d7a>] list_del+0xb/0x6b        <<<<<<<
    RSP: 0018:ffff81303fb59ad0  EFLAGS: 00010082
    RAX: 0000000000000013 RBX: 0000000000000087 RCX: 0000000000200200
    RDX: 0000000000000001 RSI: ffff81303d1c2180 RDI: ffff81303d1c2180
    RBP: ffff81303d1c2000 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000aba51f9d R11: 0000000000000006 R12: ffff81303fb59b34
    R13: ffff81303d1c2500 R14: 0000000000000032 R15: ffff81303f79dcc0
    FS:  0000000000000000(0000) GS:ffffffff8043d080(0000)      knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 0000000000200200 CR3: 0000001474426000 CR4: 00000000000006a0
    Process events/1 (pid: 51, threadinfo ffff81303fb58000, task ffff81183fa6d830)
    Stack:  ffff81303d1c2500 ffffffff88202e80 ffff81303d1c2500 ffff81303d1c2000
    ffff81303f79dcc0 ffff811c4b5a2bc0 ffffffff80511885 ffffffff8024d007
    0000000000000019 000000000000002f 0000000000000000 ffffffff80511885
    Call Trace:
    [<ffffffff88202e80>] :vmxnet3:vmxnet3_rq_destroy_all+0x824/0x1138
    [<ffffffff8024d007>] netpoll_poll_dev+0xa2/0x36c
    [<ffffffff8024d3ab>] netpoll_send_skb_on_dev+0xda/0xef
    [<ffffffff8868f0e1>] :netconsole:write_msg+0x49/0x60
    [<ffffffff8009a372>] __call_console_drivers+0x5b/0x69
    [<ffffffff800193d4>] release_console_sem+0x143/0x205
    [<ffffffff8009ab67>] vprintk+0x2b2/0x317
    [<ffffffff80115697>] proc_mkdir_mode+0x4c/0x63
    [<ffffffff800c67ee>] register_handler_proc+0x9e/0xb0
    [<ffffffff8009ac1e>] printk+0x52/0xbd
    [<ffffffff800c541e>] setup_irq+0x186/0x1cf
    [<ffffffff882030c0>] :vmxnet3:vmxnet3_rq_destroy_all+0xa64/0x1138
    [<ffffffff800c5517>] request_irq+0xb0/0xd6
    [<ffffffff882034ea>] :vmxnet3:vmxnet3_rq_destroy_all+0xe8e/0x1138
    [<ffffffff88202825>] :vmxnet3:vmxnet3_rq_destroy_all+0x1c9/0x1138
    [<ffffffff88203e80>] :vmxnet3:vmxnet3_activate_dev+0x3c/0x1cc
    [<ffffffff8820491c>] :vmxnet3:vmxnet3_set_ringsize+0xf8/0x1068
    [<ffffffff88204c68>] :vmxnet3:vmxnet3_set_ringsize+0x444/0x1068
    [<ffffffff8004fd93>] run_workqueue+0x9e/0xfb
    [<ffffffff8004c5e6>] worker_thread+0x0/0x122
    [<ffffffff8004c6d6>] worker_thread+0xf0/0x122
    [<ffffffff80095673>] default_wake_function+0x0/0xe
    [<ffffffff80034f33>] kthread+0xfe/0x132
    [<ffffffff8006bd41>] child_rip+0xa/0x11
    [<ffffffff80034e35>] kthread+0x0/0x132
    [<ffffffff8006bd37>] child_rip+0x0/0x11
    
    
  • Backtrace of the panic task indicates that the panic occurred in the function list_del()

    crash> bt
    PID: 51     TASK: ffff81183fa6d830  CPU: 1   COMMAND: "events/1"
    #0 [ffff81303fb59830] crash_kexec at ffffffff800b76e9
    #1 [ffff81303fb598f0] __die at ffffffff80066eb7
    #2 [ffff81303fb59930] do_page_fault at ffffffff80069425
    #3 [ffff81303fb59a20] error_exit at ffffffff800668d8
       [exception RIP: list_del+11]                                           <<<<<<<<
       RIP: ffffffff80162d7a  RSP: ffff81303fb59ad0  RFLAGS: 00010082
       RAX: 0000000000000013  RBX: 0000000000000087  RCX: 0000000000200200
       RDX: 0000000000000001  RSI: ffff81303d1c2180  RDI: ffff81303d1c2180
       RBP: ffff81303d1c2000   R8: 0000000000000000   R9: 0000000000000000
       R10: 00000000aba51f9d  R11: 0000000000000006  R12: ffff81303fb59b34
       R13: ffff81303d1c2500  R14: 0000000000000032  R15: ffff81303f79dcc0
       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    #4 [ffff81303fb59ad8] vmxnet3_rq_destroy_all at ffffffff88202e80 [vmxnet3]    <<<<<<<<
    #5 [ffff81303fb59bb8] __call_console_drivers at ffffffff8009a372
    #6 [ffff81303fb59bd8] release_console_sem at ffffffff800193d4
    #7 [ffff81303fb59c08] vprintk at ffffffff8009ab67
    #8 [ffff81303fb59c88] printk at ffffffff8009ac1e
    #9 [ffff81303fb59d78] vmxnet3_rq_destroy_all at ffffffff882034ea [vmxnet3]
    #10 [ffff81303fb59f48] kernel_thread at ffffffff8006bd41
    
    
  • Disassembly of exception RIP: list_del+11

    crash> dis -rl ffffffff80162d7a
    /usr/src/debug/kernel-2.6.18/linux-2.6.18-426.el5.x86_64/lib/list_debug.c: 61
    0xffffffff80162d6f <list_del>:   sub    $0x8,%rsp
    /usr/src/debug/kernel-2.6.18/linux-2.6.18-426.el5.x86_64/lib/list_debug.c: 62
    0xffffffff80162d73 <list_del+4>: mov    0x8(%rdi),%rcx
    /usr/src/debug/kernel-2.6.18/linux-2.6.18-426.el5.x86_64/lib/list_debug.c: 61
    0xffffffff80162d77 <list_del+8>: mov    %rdi,%rsi
    /usr/src/debug/kernel-2.6.18/linux-2.6.18-426.el5.x86_64/lib/list_debug.c: 62
    0xffffffff80162d7a <list_del+11>:    mov    (%rcx),%rdx       <<<<<<< Kernel crashed here
    
    
  • The corresponding kernel source lib/list_debug.clib/list_debug.c

    ...
    54 /**
    55  * list_del - deletes entry from list.
    56  * @entry: the element to delete from the list.
    57  * Note: list_empty on entry does not return true after this, the entry is
    58  * in an undefined state.
    59  */
    60 void list_del(struct list_head *entry)
    61 {
    62         if (unlikely(entry->prev->next != entry)) {
    ....                             ^
                                     |_____kernel crashed here
    
    
  • The panic occurred while dereferencing the address stored in the register %rcx

    0xffffffff80162d7a <list_del+11>:    mov    (%rcx),%rdx
    
  • The address stored in the register %rcx at <list_del+11> is 0000000000200200.

  • The address in register %rcx is populated at <list_del+4> from 0x8 offset of %rdi.

    0xffffffff80162d73 <list_del+4>: mov    0x8(%rdi),%rcx
    
  • The address in register %rdi is passed from the function vmxnet3_rq_destroy_all().

    crash> dis -rl ffffffff88202e80 | tail -n 3
    0xffffffff88202e74 <vmxnet3_rq_destroy_all+2072>:    lea    0x180(%rbp),%rdi
    0xffffffff88202e7b <vmxnet3_rq_destroy_all+2079>:    callq  0xffffffff80162d6f <list_del>
    0xffffffff88202e80 <vmxnet3_rq_destroy_all+2084>:    lock btrl $0x5,0x40(%rbp)
    
    crash> px list_del
    list_del = $1 = 
    {void (struct list_head *)} 0xffffffff80162d6f <list_del>
    
    crash> struct list_head 0xffff81303d1c2180
    struct list_head {
     next = 0x100100, 
     prev = 0x200200   <<<<<<< /* Invalid address */
    }
    crash> px (0xffff81303d1c2180+0x8)
    $2 = 0xffff81303d1c2188
    
    crash> rd 0xffff81303d1c2188
    ffff81303d1c2188:  0000000000200200                    
                           ^
                           |_____Invalid address 
    
  • The function vmxnet3_rq_destroy_all() is the part of an unsigned (U) module [vmxnet3].

    crash> sym vmxnet3_rq_destroy_all
    ffffffff8820265c (t) vmxnet3_rq_destroy_all [vmxnet3]
    
    crash> mod -t | grep -e NAME -e vmxnet3
    NAME                   LICENSE_GPLOK
    vmxnet3                40(U)
    
    crash> module.state,name,version,srcversion,gpgsig_ok ffffffff8820bc00
    state      = MODULE_STATE_LIVE
    name       = "vmxnet3\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\..."
    version    = 0xffff81303f780440 "1.4.2.0"
    srcversion = 0x0
    gpgsig_ok  = 0
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments