Kernel panic with null pointer dereference at be_insert_vlan_in_pkt
Environment
- Red Hat Enterprise Linux 5
kernel-2.6.18-371.el5
- Emulex OneConnect 10GbE network interface using
be2netdriverbe2netversionv4.9.311.1provided by HP'skmod-be2net-4.9.311.1-1.rhel5u10RPM package- NIC firmware
4.9.416.0
Issue
- Kernel panic with null pointer dereference in
be_insert_vlan_in_pkt+702orbe_insert_vlan_in_pkt+0x2be
PID: 15565 TASK: ffff8125d2e3a830 CPU: 19 COMMAND: "processname"
#0 [ffff81203fe2f9f0] crash_kexec at ffffffff800b1509
#1 [ffff81203fe2fab0] __die at ffffffff80065137
#2 [ffff81203fe2faf0] do_page_fault at ffffffff80067430
#3 [ffff81203fe2fbe0] error_exit at ffffffff8005ddf9
[exception RIP: be_insert_vlan_in_pkt+702]
RIP: ffffffff882bf8a6 RSP: ffff81203fe2fc90 RFLAGS: 00010282
RAX: 0000000000000024 RBX: 0000000000000000 RCX: ffffffff803270a8
RDX: ffffffff803270a8 RSI: 0000000000000000 RDI: ffffffff803270a0
RBP: 0000000000000000 R8: ffffffff803270a8 R9: 0000000000000001
R10: 0000000000000740 R11: 00000000000000ae R12: 00000000000003e8
R13: ffff8120390d8500 R14: ffff81203fe2fd07 R15: ffff8120390d9280
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
#5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3
#6 [ffff81203fe2fd68] __qdisc_run at ffffffff8024c1ed
#7 [ffff81203fe2fda8] dev_queue_xmit at ffffffff8002fbe7
#8 [ffff81203fe2fdd8] bond_dev_queue_xmit at ffffffff8849bf3b [bonding]
#9 [ffff81203fe2fe08] bond_xmit_activebackup at ffffffff8849dc3d [bonding]
#10 [ffff81203fe2fe28] dev_hard_start_xmit at ffffffff8023bff3
#11 [ffff81203fe2fe58] dev_queue_xmit at ffffffff8002fc5c
#12 [ffff81203fe2fe88] arp_xmit at ffffffff8026c241
#13 [ffff81203fe2fe98] arp_solicit at ffffffff8026d42e
#14 [ffff81203fe2fee8] neigh_timer_handler at ffffffff80242adc
#15 [ffff81203fe2ff08] run_timer_softirq at ffffffff8009a86e
#16 [ffff81203fe2ff58] __do_softirq at ffffffff800125a2
#17 [ffff81203fe2ff88] call_softirq at ffffffff8005e30c
#18 [ffff81203fe2ffa0] do_softirq at ffffffff8006d644
#19 [ffff81203fe2ffb0] apic_timer_interrupt at ffffffff8005dc9e
--- <IRQ stack> ---
#20 [ffff8128ca2779c8] apic_timer_interrupt at ffffffff8005dc9e
#21 [ffff8128ca277b60] shrink_zone at ffffffff800132cc
#22 [ffff8128ca277ba0] try_to_free_pages at ffffffff800cfff1
#23 [ffff8128ca277c30] __alloc_pages at ffffffff8000f60a
#24 [ffff8128ca277ca0] tcp_sendmsg at ffffffff80026844
#25 [ffff8128ca277d30] do_sock_write at ffffffff800380a6
#26 [ffff8128ca277d60] sock_aio_write at ffffffff800479bc
#27 [ffff8128ca277e00] do_sync_write at ffffffff800183ce
#28 [ffff8128ca277f10] vfs_write at ffffffff80016b5e
#29 [ffff8128ca277f40] sys_write at ffffffff80017414
#30 [ffff8128ca277f80] tracesys at ffffffff8005d29e (via system_call)
Resolution
A third party driver is in use.
Such software is not within the scope of Red Hat technical support.
The issue may be referred to the provider of the third-party driver, or reproduced on the driver which Red Hat supply in the kernel package and which Red Hat do support.
Diagnostic Steps
We kernel panicked due to a null pointer dereference in Emulex driver code.
crash> sys | grep PANIC
PANIC: "Unable to handle kernel NULL pointer dereference at 00000000000000d8 RIP: "
crash> bt
PID: 15565 TASK: ffff8125d2e3a830 CPU: 19 COMMAND: "processname"
#0 [ffff81203fe2f9f0] crash_kexec at ffffffff800b1509
#1 [ffff81203fe2fab0] __die at ffffffff80065137
#2 [ffff81203fe2faf0] do_page_fault at ffffffff80067430
#3 [ffff81203fe2fbe0] error_exit at ffffffff8005ddf9
[exception RIP: be_insert_vlan_in_pkt+702]
RIP: ffffffff882bf8a6 RSP: ffff81203fe2fc90 RFLAGS: 00010282
RAX: 0000000000000024 RBX: 0000000000000000 RCX: ffffffff803270a8
RDX: ffffffff803270a8 RSI: 0000000000000000 RDI: ffffffff803270a0
RBP: 0000000000000000 R8: ffffffff803270a8 R9: 0000000000000001
R10: 0000000000000740 R11: 00000000000000ae R12: 00000000000003e8
R13: ffff8120390d8500 R14: ffff81203fe2fd07 R15: ffff8120390d9280
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
#5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3
#6 [ffff81203fe2fd68] __qdisc_run at ffffffff8024c1ed
#7 [ffff81203fe2fda8] dev_queue_xmit at ffffffff8002fbe7
#8 [ffff81203fe2fdd8] bond_dev_queue_xmit at ffffffff8849bf3b [bonding]
#9 [ffff81203fe2fe08] bond_xmit_activebackup at ffffffff8849dc3d [bonding]
#10 [ffff81203fe2fe28] dev_hard_start_xmit at ffffffff8023bff3
#11 [ffff81203fe2fe58] dev_queue_xmit at ffffffff8002fc5c
#12 [ffff81203fe2fe88] arp_xmit at ffffffff8026c241
#13 [ffff81203fe2fe98] arp_solicit at ffffffff8026d42e
#14 [ffff81203fe2fee8] neigh_timer_handler at ffffffff80242adc
#15 [ffff81203fe2ff08] run_timer_softirq at ffffffff8009a86e
#16 [ffff81203fe2ff58] __do_softirq at ffffffff800125a2
#17 [ffff81203fe2ff88] call_softirq at ffffffff8005e30c
#18 [ffff81203fe2ffa0] do_softirq at ffffffff8006d644
#19 [ffff81203fe2ffb0] apic_timer_interrupt at ffffffff8005dc9e
--- <IRQ stack> ---
...
crash> dis -r ffffffff882bf8a6 | tail
0xffffffff882bf877 <be_insert_vlan_in_pkt+655>: rol $0x8,%r12w
0xffffffff882bf87c <be_insert_vlan_in_pkt+660>: lea 0x4(%rbx),%rsi
0xffffffff882bf880 <be_insert_vlan_in_pkt+664>: mov %rbx,%rdi
0xffffffff882bf883 <be_insert_vlan_in_pkt+667>: callq 0xffffffff800233d5 <memmove>
0xffffffff882bf888 <be_insert_vlan_in_pkt+672>: movw $0x81,0xc(%rbx)
0xffffffff882bf88e <be_insert_vlan_in_pkt+678>: mov %r12w,0xe(%rbx)
0xffffffff882bf893 <be_insert_vlan_in_pkt+683>: subq $0x4,0x40(%rbp)
0xffffffff882bf898 <be_insert_vlan_in_pkt+688>: subq $0x4,0x38(%rbp)
0xffffffff882bf89d <be_insert_vlan_in_pkt+693>: movw $0x81,0x9e(%rbp)
0xffffffff882bf8a6 <be_insert_vlan_in_pkt+702>: mov 0xd8(%rbp),%rax
It looks like we were attempting to insert a VLAN into a packet that was passed to us by the be_xmit() function.
crash> whatis be_xmit
int be_xmit(struct sk_buff *, struct net_device *);
crash> dis -r ffffffff8023bff3 | tail
0xffffffff8023bfd1 <dev_hard_start_xmit+405>: mov %rax,0x58(%rbp)
0xffffffff8023bfd5 <dev_hard_start_xmit+409>: jmp 0xffffffff8023bfdf <dev_hard_start_xmit+419>
0xffffffff8023bfd7 <dev_hard_start_xmit+411>: test %eax,%eax
0xffffffff8023bfd9 <dev_hard_start_xmit+413>: jne 0xffffffff8023c0d2 <dev_hard_start_xmit+662>
0xffffffff8023bfdf <dev_hard_start_xmit+419>: cmpq $0x0,0x0(%rbp)
0xffffffff8023bfe4 <dev_hard_start_xmit+424>: jne 0xffffffff8023c053 <dev_hard_start_xmit+535>
0xffffffff8023bfe6 <dev_hard_start_xmit+426>: mov %r14,%rsi <--
0xffffffff8023bfe9 <dev_hard_start_xmit+429>: mov %rbp,%rdi <--
0xffffffff8023bfec <dev_hard_start_xmit+432>: callq *0x290(%r14)
0xffffffff8023bff3 <dev_hard_start_xmit+439>: test %eax,%eax
crash> dis -r ffffffff882c1c47
0xffffffff882c1a69 <be_xmit>: push %r15
0xffffffff882c1a6b <be_xmit+2>: mov %rsi,%r15
0xffffffff882c1a6e <be_xmit+5>: add $0x1280,%r15
0xffffffff882c1a75 <be_xmit+12>: push %r14 <--
0xffffffff882c1a77 <be_xmit+14>: mov %rsi,%r14
0xffffffff882c1a7a <be_xmit+17>: add $0x500,%r14
0xffffffff882c1a81 <be_xmit+24>: push %r13
0xffffffff882c1a83 <be_xmit+26>: push %r12
0xffffffff882c1a85 <be_xmit+28>: mov %rsi,%r12
0xffffffff882c1a88 <be_xmit+31>: add $0x1288,%r12
0xffffffff882c1a8f <be_xmit+38>: push %rbp <--
crash> bt -f | awk '/#4/,/#5/'
#4 [ffff81203fe2fcb8] be_xmit at ffffffff882c1c47 [be2net]
ffff81203fe2fcc0: ffff81403fffe4e0 ffff8120390d8000
ffff81203fe2fcd0: ffff81403fffe4c0 0000000000000001
ffff81203fe2fce0: ffff81403fffe4c0 ffffffff80017a4e
ffff81203fe2fcf0: 0000000000000000 0000022000000040
ffff81203fe2fd00: 0000000000000003 ffff8120390d8000
ffff81203fe2fd10: ffff8137fbb410c0 0000000000000000
RBP R12
ffff81203fe2fd20: ffff81403e494600 ffff8120390d8000
R13 R14
ffff81203fe2fd30: 000000019b01a43f ffffffff8023bff3
R15 RIP
#5 [ffff81203fe2fd38] dev_hard_start_xmit at ffffffff8023bff3
System has since been updated to the latest kernel 2.6.18-406.el5.
Whilst the kernel panic is in the be2net driver module, the driver in use is not supplied or supported by Red Hat.
The (U) unsigned flag indicates an Unsigned driver, that is a driver not distributed with the Red Hat kernel:
crash> mod -t | grep U
cciss 40(U)
be2net 40(U)
The path of the driver in the sosreport is a path which overrides the in-kernel modules:
$ grep be2net sos_commands/kernel/modinfo*
filename: /lib/modules/2.6.18-406.el5/weak-updates/be2net/be2net.ko
This driver appears to come from the following HP Emulex 10GbE Drivers for Red Hat Enterprise Linux 5 package:
$ grep be2net installed-rpms
kmod-be2net-4.9.311.1-1.rhel5u10.x86_64 Wed 23 Sep 2015 12:18:25 AM WIT
Please refer this issue to the third-party provider of the be2net module in use, this appears to be HP.
You are welcome to reproduce this with the in-kernel driver which Red Hat do supply and support, however please note that RHEL5's in-kernel driver is a much older version compared to what is installed:
# modinfo be2net | egrep "^filename|^version"
filename: /lib/modules/2.6.18-406.el5/kernel/drivers/net/benet/be2net.ko
version: 4.2.116r
$ grep -A10 be2net sos_commands/kernel/modinfo* | egrep "^filename|^version"
filename: /lib/modules/2.6.18-406.el5/weak-updates/be2net/be2net.ko
version: 4.9.311.1
You may wish to remain on the more recent third-party driver.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
