Kernel panic with null pointer dereference at be_insert_vlan_in_pkt

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
    • kernel-2.6.18-371.el5
  • Emulex OneConnect 10GbE network interface using be2net driver
    • be2net version v4.9.311.1 provided by HP's kmod-be2net-4.9.311.1-1.rhel5u10 RPM package
    • NIC firmware 4.9.416.0

Issue

  • Kernel panic with null pointer dereference in be_insert_vlan_in_pkt+702 or be_insert_vlan_in_pkt+0x2be
PID: 15565  TASK: ffff8125d2e3a830  CPU: 19  COMMAND: "processname"
 #0 [ffff81203fe2f9f0] crash_kexec at ffffffff800b1509
 #1 [ffff81203fe2fab0] __die at ffffffff80065137
 #2 [ffff81203fe2faf0] do_page_fault at ffffffff80067430
 #3 [ffff81203fe2fbe0] error_exit at ffffffff8005ddf9
    [exception RIP: be_insert_vlan_in_pkt+702]
    RIP: ffffffff882bf8a6  RSP: ffff81203fe2fc90  RFLAGS: 00010282
    RAX: 0000000000000024  RBX: 0000000000000000  RCX: ffffffff803270a8
    RDX: ffffffff803270a8  RSI: 0000000000000000  RDI: ffffffff803270a0
    RBP: 0000000000000000   R8: ffffffff803270a8   R9: 0000000000000001
    R10: 0000000000000740  R11: 00000000000000ae  R12: 00000000000003e8
    R13: ffff8120390d8500  R14: ffff81203fe2fd07  R15: ffff8120390d9280
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
 #5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3
 #6 [ffff81203fe2fd68] __qdisc_run at ffffffff8024c1ed
 #7 [ffff81203fe2fda8] dev_queue_xmit at ffffffff8002fbe7
 #8 [ffff81203fe2fdd8] bond_dev_queue_xmit at ffffffff8849bf3b [bonding]
 #9 [ffff81203fe2fe08] bond_xmit_activebackup at ffffffff8849dc3d [bonding]
#10 [ffff81203fe2fe28] dev_hard_start_xmit at ffffffff8023bff3
#11 [ffff81203fe2fe58] dev_queue_xmit at ffffffff8002fc5c
#12 [ffff81203fe2fe88] arp_xmit at ffffffff8026c241
#13 [ffff81203fe2fe98] arp_solicit at ffffffff8026d42e
#14 [ffff81203fe2fee8] neigh_timer_handler at ffffffff80242adc
#15 [ffff81203fe2ff08] run_timer_softirq at ffffffff8009a86e
#16 [ffff81203fe2ff58] __do_softirq at ffffffff800125a2
#17 [ffff81203fe2ff88] call_softirq at ffffffff8005e30c
#18 [ffff81203fe2ffa0] do_softirq at ffffffff8006d644
#19 [ffff81203fe2ffb0] apic_timer_interrupt at ffffffff8005dc9e
--- <IRQ stack> ---
#20 [ffff8128ca2779c8] apic_timer_interrupt at ffffffff8005dc9e
#21 [ffff8128ca277b60] shrink_zone at ffffffff800132cc
#22 [ffff8128ca277ba0] try_to_free_pages at ffffffff800cfff1
#23 [ffff8128ca277c30] __alloc_pages at ffffffff8000f60a
#24 [ffff8128ca277ca0] tcp_sendmsg at ffffffff80026844
#25 [ffff8128ca277d30] do_sock_write at ffffffff800380a6
#26 [ffff8128ca277d60] sock_aio_write at ffffffff800479bc
#27 [ffff8128ca277e00] do_sync_write at ffffffff800183ce
#28 [ffff8128ca277f10] vfs_write at ffffffff80016b5e
#29 [ffff8128ca277f40] sys_write at ffffffff80017414
#30 [ffff8128ca277f80] tracesys at ffffffff8005d29e (via system_call)

Resolution

A third party driver is in use.

Such software is not within the scope of Red Hat technical support.

The issue may be referred to the provider of the third-party driver, or reproduced on the driver which Red Hat supply in the kernel package and which Red Hat do support.

Diagnostic Steps

We kernel panicked due to a null pointer dereference in Emulex driver code.

crash> sys | grep PANIC
       PANIC: "Unable to handle kernel NULL pointer dereference at 00000000000000d8 RIP: "
crash> bt
PID: 15565  TASK: ffff8125d2e3a830  CPU: 19  COMMAND: "processname"
 #0 [ffff81203fe2f9f0] crash_kexec at ffffffff800b1509
 #1 [ffff81203fe2fab0] __die at ffffffff80065137
 #2 [ffff81203fe2faf0] do_page_fault at ffffffff80067430
 #3 [ffff81203fe2fbe0] error_exit at ffffffff8005ddf9
    [exception RIP: be_insert_vlan_in_pkt+702]
    RIP: ffffffff882bf8a6  RSP: ffff81203fe2fc90  RFLAGS: 00010282
    RAX: 0000000000000024  RBX: 0000000000000000  RCX: ffffffff803270a8
    RDX: ffffffff803270a8  RSI: 0000000000000000  RDI: ffffffff803270a0
    RBP: 0000000000000000   R8: ffffffff803270a8   R9: 0000000000000001
    R10: 0000000000000740  R11: 00000000000000ae  R12: 00000000000003e8
    R13: ffff8120390d8500  R14: ffff81203fe2fd07  R15: ffff8120390d9280
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
 #5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3
 #6 [ffff81203fe2fd68] __qdisc_run at ffffffff8024c1ed
 #7 [ffff81203fe2fda8] dev_queue_xmit at ffffffff8002fbe7
 #8 [ffff81203fe2fdd8] bond_dev_queue_xmit at ffffffff8849bf3b [bonding]
 #9 [ffff81203fe2fe08] bond_xmit_activebackup at ffffffff8849dc3d [bonding]
#10 [ffff81203fe2fe28] dev_hard_start_xmit at ffffffff8023bff3
#11 [ffff81203fe2fe58] dev_queue_xmit at ffffffff8002fc5c
#12 [ffff81203fe2fe88] arp_xmit at ffffffff8026c241
#13 [ffff81203fe2fe98] arp_solicit at ffffffff8026d42e
#14 [ffff81203fe2fee8] neigh_timer_handler at ffffffff80242adc
#15 [ffff81203fe2ff08] run_timer_softirq at ffffffff8009a86e
#16 [ffff81203fe2ff58] __do_softirq at ffffffff800125a2
#17 [ffff81203fe2ff88] call_softirq at ffffffff8005e30c
#18 [ffff81203fe2ffa0] do_softirq at ffffffff8006d644
#19 [ffff81203fe2ffb0] apic_timer_interrupt at ffffffff8005dc9e
--- <IRQ stack> ---
...

crash> dis -r ffffffff882bf8a6 | tail
0xffffffff882bf877 <be_insert_vlan_in_pkt+655>: rol    $0x8,%r12w
0xffffffff882bf87c <be_insert_vlan_in_pkt+660>: lea    0x4(%rbx),%rsi
0xffffffff882bf880 <be_insert_vlan_in_pkt+664>: mov    %rbx,%rdi
0xffffffff882bf883 <be_insert_vlan_in_pkt+667>: callq  0xffffffff800233d5 <memmove>
0xffffffff882bf888 <be_insert_vlan_in_pkt+672>: movw   $0x81,0xc(%rbx)
0xffffffff882bf88e <be_insert_vlan_in_pkt+678>: mov    %r12w,0xe(%rbx)
0xffffffff882bf893 <be_insert_vlan_in_pkt+683>: subq   $0x4,0x40(%rbp)
0xffffffff882bf898 <be_insert_vlan_in_pkt+688>: subq   $0x4,0x38(%rbp)
0xffffffff882bf89d <be_insert_vlan_in_pkt+693>: movw   $0x81,0x9e(%rbp)
0xffffffff882bf8a6 <be_insert_vlan_in_pkt+702>: mov    0xd8(%rbp),%rax

It looks like we were attempting to insert a VLAN into a packet that was passed to us by the be_xmit() function.

crash> whatis be_xmit
int be_xmit(struct sk_buff *, struct net_device *);

crash> dis -r ffffffff8023bff3 | tail
0xffffffff8023bfd1 <dev_hard_start_xmit+405>:   mov    %rax,0x58(%rbp)
0xffffffff8023bfd5 <dev_hard_start_xmit+409>:   jmp    0xffffffff8023bfdf <dev_hard_start_xmit+419>
0xffffffff8023bfd7 <dev_hard_start_xmit+411>:   test   %eax,%eax
0xffffffff8023bfd9 <dev_hard_start_xmit+413>:   jne    0xffffffff8023c0d2 <dev_hard_start_xmit+662>
0xffffffff8023bfdf <dev_hard_start_xmit+419>:   cmpq   $0x0,0x0(%rbp)
0xffffffff8023bfe4 <dev_hard_start_xmit+424>:   jne    0xffffffff8023c053 <dev_hard_start_xmit+535>
0xffffffff8023bfe6 <dev_hard_start_xmit+426>:   mov    %r14,%rsi    <--
0xffffffff8023bfe9 <dev_hard_start_xmit+429>:   mov    %rbp,%rdi    <--
0xffffffff8023bfec <dev_hard_start_xmit+432>:   callq  *0x290(%r14)
0xffffffff8023bff3 <dev_hard_start_xmit+439>:   test   %eax,%eax

crash> dis -r ffffffff882c1c47
0xffffffff882c1a69 <be_xmit>:   push   %r15
0xffffffff882c1a6b <be_xmit+2>: mov    %rsi,%r15
0xffffffff882c1a6e <be_xmit+5>: add    $0x1280,%r15
0xffffffff882c1a75 <be_xmit+12>:        push   %r14 <--
0xffffffff882c1a77 <be_xmit+14>:        mov    %rsi,%r14
0xffffffff882c1a7a <be_xmit+17>:        add    $0x500,%r14
0xffffffff882c1a81 <be_xmit+24>:        push   %r13
0xffffffff882c1a83 <be_xmit+26>:        push   %r12
0xffffffff882c1a85 <be_xmit+28>:        mov    %rsi,%r12
0xffffffff882c1a88 <be_xmit+31>:        add    $0x1288,%r12
0xffffffff882c1a8f <be_xmit+38>:        push   %rbp <--

crash> bt -f | awk '/#4/,/#5/'
 #4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
    ffff81203fe2fcc0: ffff81403fffe4e0 ffff8120390d8000 
    ffff81203fe2fcd0: ffff81403fffe4c0 0000000000000001 
    ffff81203fe2fce0: ffff81403fffe4c0 ffffffff80017a4e 
    ffff81203fe2fcf0: 0000000000000000 0000022000000040 
    ffff81203fe2fd00: 0000000000000003 ffff8120390d8000 
    ffff81203fe2fd10: ffff8137fbb410c0 0000000000000000 
            RBP     R12
    ffff81203fe2fd20: ffff81403e494600 ffff8120390d8000 
            R13     R14
    ffff81203fe2fd30: 000000019b01a43f ffffffff8023bff3
            R15     RIP 
 #5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3

System has since been updated to the latest kernel 2.6.18-406.el5.

Whilst the kernel panic is in the be2net driver module, the driver in use is not supplied or supported by Red Hat.

The (U) unsigned flag indicates an Unsigned driver, that is a driver not distributed with the Red Hat kernel:

crash> mod -t | grep U
cciss              40(U)
be2net             40(U)

The path of the driver in the sosreport is a path which overrides the in-kernel modules:

$ grep be2net sos_commands/kernel/modinfo*
filename:       /lib/modules/2.6.18-406.el5/weak-updates/be2net/be2net.ko

This driver appears to come from the following HP Emulex 10GbE Drivers for Red Hat Enterprise Linux 5 package:

$ grep be2net installed-rpms 
kmod-be2net-4.9.311.1-1.rhel5u10.x86_64                     Wed 23 Sep 2015 12:18:25 AM WIT

Please refer this issue to the third-party provider of the be2net module in use, this appears to be HP.

You are welcome to reproduce this with the in-kernel driver which Red Hat do supply and support, however please note that RHEL5's in-kernel driver is a much older version compared to what is installed:

# modinfo be2net | egrep "^filename|^version"
filename:       /lib/modules/2.6.18-406.el5/kernel/drivers/net/benet/be2net.ko
version:        4.2.116r

$ grep -A10 be2net sos_commands/kernel/modinfo* | egrep "^filename|^version"
filename:       /lib/modules/2.6.18-406.el5/weak-updates/be2net/be2net.ko
version:        4.9.311.1

You may wish to remain on the more recent third-party driver.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.