RHEL 6.3: Why kernel panics with exception RIP: vlan_gro_common when modifying network node details

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6.3
  • VLAN

Issue

Kernel panics with exception RIP: vlan_gro_common() when modifying network node details

Resolution

Reproduce on RHEL 6.7 (kernel-2.6.32-573) or later.

The VLAN model has completely changed as of development kernel 2.6.32-511 so this issue will not be investigated further within the old VLAN model.

Root Cause

When attempting to remove a node from network group, bnx2_poll_work() is following the vlan_gro_receive() processing path even if the underlying net_device is no longer a valid VLAN device. This is causing accessing invalid VLAN device from the VLAN group causing the panic.

Diagnostic Steps

  1. Attempted to remove int001st001 from the network group INT
[root@gremlin.mgmt001st001 ~]# chnwgroup INT --remove int001st001
EFSSG0105C Remote access execution error for nodes of cluster gremlin.storage.tucson.ibm.com. Cause:  Detailed Message :EFSSG0220C Cannot access a test file on host.
  1. At the time of node removal from network group INT, int001st001 crashed
 [root@gremlin.mgmt001st001 ~]# lsnode
Hostname     IP           Description             Role                 Product version Connection status  GPFS status CTDB status Last updated
int001st001  172.31.132.1                         interface            1.5.0.0-3       CONNECTION PROBLEM unknown     unreachable 1/25/13 8:07 AM
int002st001  172.31.132.2                         interface            1.5.0.0-3       OK                 active      active      1/25/13 8:07 AM
mgmt001st001 172.31.136.2 active

Following is the backtrace....

crash> sys
      KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/vmlinux
    DUMPFILE: vmcore
        CPUS: 16
        DATE: Fri Jan 25 07:05:27 2013
      UPTIME: 7 days, 17:22:10
LOAD AVERAGE: 1.10, 1.17, 1.21
       TASKS: 749
    NODENAME: int001st001
     RELEASE: 2.6.32-279.el6.x86_64
     VERSION: #1 SMP Wed Jun 13 18:24:36 EDT 2012
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 64 GB
       PANIC: "Oops: 0000 [#1] SMP " (check log for details)
crash> bt
PID: 0      TASK: ffff88086c5cd540  CPU: 8   COMMAND: "swapper"
 #0 [ffff8800283038b0] machine_kexec at ffffffff8103281b
 #1 [ffff880028303910] crash_kexec at ffffffff810ba662
 #2 [ffff8800283039e0] oops_end at ffffffff81501290
 #3 [ffff880028303a10] no_context at ffffffff81043bab
 #4 [ffff880028303a60] __bad_area_nosemaphore at ffffffff81043e35
 #5 [ffff880028303ab0] bad_area_nosemaphore at ffffffff81043f03
 #6 [ffff880028303ac0] __do_page_fault at ffffffff81044661
 #7 [ffff880028303be0] do_page_fault at ffffffff8150326e
 #8 [ffff880028303c10] page_fault at ffffffff81500625
    [exception RIP: vlan_gro_common+211]
    RIP: ffffffff814d7a33  RSP: ffff880028303cc0  RFLAGS: 00010206
    RAX: 0000000000000001  RBX: 0000000000000088  RCX: ffff88069ef761c0
    RDX: 0000000000000088  RSI: ffff880d3ad40540  RDI: ffff88106a860720
    RBP: ffff880028303cd0   R8: ffff8804c7d6b020   R9: 0000000000000000
    R10: ffffffff81286af0  R11: 0000000000000000  R12: ffff88106a860720
    R13: ffff88106a8606e0  R14: 0000000000000001  R15: 000000000000010d
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880028303cd8] vlan_gro_receive at ffffffff814d7d6a
#10 [ffff880028303d08] bnx2_poll_work at ffffffffa000c813 [bnx2]
#11 [ffff880028303e18] bnx2_poll at ffffffffa000d579 [bnx2]
#12 [ffff880028303e68] net_rx_action at ffffffff8143f193
#13 [ffff880028303ec8] __do_softirq at ffffffff81073ec1
#14 [ffff880028303f38] call_softirq at ffffffff8100c24c
#15 [ffff880028303f50] do_softirq at ffffffff8100de85
#16 [ffff880028303f70] irq_exit at ffffffff81073ca5
#17 [ffff880028303f80] do_IRQ at ffffffff81505af5
--- <IRQ stack> ---
#18 [ffff88106cfc7e38] ret_from_intr at ffffffff8100ba53
    [exception RIP: mwait_idle+119]
    RIP: ffffffff81014877  RSP: ffff88106cfc7ee8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff88106cfc7ef8  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff88106cfc7fd8  RDI: ffffffff81dda228
    RBP: ffffffff8100ba4e   R8: 0000000000000000   R9: 0000000000000002
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffffff810137f3
    R13: ffff88106cfc7e58  R14: ffffffff814fd830  R15: ffff88106cfc7ef8
    ORIG_RAX: ffffffffffffff24  CS: 0010  SS: 0018
#19 [ffff88106cfc7f00] cpu_idle at ffffffff81009e06
crash> quit

Here is further analysis of the vmcore:

crash> dis -lr vlan_gro_common+0xd3
[...]
/usr/src/debug/kernel-2.6.32-279.el6/linux-2.6.32-279.el6.x86_64/include/linux/if_vlan.h: 102
0xffffffff814d7a13 <vlan_gro_common+0xb3>:      mov    %edx,%eax
0xffffffff814d7a15 <vlan_gro_common+0xb5>:      shr    $0x9,%ax
0xffffffff814d7a19 <vlan_gro_common+0xb9>:      movzwl %ax,%eax
0xffffffff814d7a1c <vlan_gro_common+0xbc>:      mov    0x20(%rsi,%rax,8),%rax
/usr/src/debug/kernel-2.6.32-279.el6/linux-2.6.32-279.el6.x86_64/include/linux/if_vlan.h: 103
0xffffffff814d7a21 <vlan_gro_common+0xc1>:      test   %rax,%rax
0xffffffff814d7a24 <vlan_gro_common+0xc4>:      je     0xffffffff814d7b30 <vlan_gro_common+0x1d0>
0xffffffff814d7a2a <vlan_gro_common+0xca>:      mov    %rdx,%rbx
0xffffffff814d7a2d <vlan_gro_common+0xcd>:      and    $0x1ff,%ebx
0xffffffff814d7a33 <vlan_gro_common+0xd3>:      mov    (%rax,%rbx,8),%rax <=== 

%rax = 0x1
%rbx = 0x88 
 (%rax,%rbx,8) = (0x88 * 8)+1 = 1b9 = 441 decimal 

From log
BUG: unable to handle kernel NULL pointer dereference at 0000000000000441

static inline struct net_device *vlan_group_get_device(struct vlan_group *vg,
                                                       u16 vlan_id)
{
        struct net_device **array;
        array = vg->vlan_devices_arrays[vlan_id / VLAN_GROUP_ARRAY_PART_LEN];
        return array ? array[vlan_id % VLAN_GROUP_ARRAY_PART_LEN] : NULL;         <=== crashed while finding the real device for VLAN ID 0x88 i.e. 136 which is ethX0.136
}

Looks like its just a symptom.
Lets try to find out something much before we arrived here..

One thing to take into acount is that we are in the process of removing the int001st001 from the network group INT

Important structures in the context are:

struct bnx2_napi ffff88106a860720
struct sk_buff ffff88069ef761c0
struct net_device 0xffff88106a860020
struct vlan_group 0xffff880d3ad40540
struct bnx2 0xffff88106a8606e0

Lets see what all these structures contain:

crash> whatis vlan_gro_receive
int vlan_gro_receive(struct napi_struct *, struct vlan_group *, unsigned int, struct sk_buff *);

crash> dis -lr ffffffffa000c813
/usr/src/debug/kernel-2.6.32-279.el6/linux-2.6.32-279.el6.x86_64/drivers/net/bnx2.c: 3462
0xffffffffa000c1e0 <bnx2_poll_work>:    push   %rbp
0xffffffffa000c1e1 <bnx2_poll_work+0x1>:        mov    %rsp,%rbp
0xffffffffa000c1e4 <bnx2_poll_work+0x4>:        push   %r15

[...]

0xffffffffa000c807 <bnx2_poll_work+0x627>:      movzwl %r12w,%edx
0xffffffffa000c80b <bnx2_poll_work+0x62b>:      mov    %rbx,%rcx  <==== rcx is sk_buff ffff88069ef761c0 
0xffffffffa000c80e <bnx2_poll_work+0x62e>:      callq  0xffffffff814d7cf0 <vlan_gro_receive>
/usr/src/debug/kernel-2.6.32-279.el6/linux-2.6.32-279.el6.x86_64/drivers/net/bnx2.c: 3256
0xffffffffa000c813 <bnx2_poll_work+0x633>:      addl   $0x1,-0xd4(%rbp)
crash>

We are operating on struct vlan_group. lets see it its intact and sane.

crash> struct vlan_group 0xffff880d3ad40540
struct vlan_group {
  real_dev = 0x3, 
  nr_vlans = 0x0, 
  hlist = {
    next = 0xffff880d14724000, 
    pprev = 0xffffffff81648de0
  }, 
  vlan_devices_arrays = {0x1, 0xffff880d3ad40568, 0xffff880d3ad40568, 0x0, 0x100000000, 0xdead000000100100, 0xdead000000200200, 0x0}, 
  rcu = {                ^^^ 
    next = 0x0, 
    func = 0
  }
}
crash> 

Looks like we have already teared down the Vlan but vlan_group is not NULL.

crash> struct sk_buff.dev ffff88069ef761c0
  dev = 0xffff88106a860020

crash> struct net_device.priv_flags 0xffff88106a860020
  priv_flags = 0x420

VLAN device check is made as follows 

#define IFF_802_1Q_VLAN 0x1             /* 802.1Q VLAN device.          */

static inline int is_vlan_dev(struct net_device *dev)
{
        return dev->priv_flags & IFF_802_1Q_VLAN;
}               

so in our case it turns out to be

0x420 & 0x1 = 0x00

This clearly indicates that the net_device we are working on is no more VLAN device and we are still processing this in the VLAN GRO path which we shouldn't have
So it looks like VLAN does not currently exist but skb->dev being the primary IFace is still getting that treatment.
So we should not continue if skb->dev is not a VLAN device as it appears that we have done tearing that down already..

We should have used similar construct used for this as follows at some point in the vlan_gro_receive() path

static gro_result_t
vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
                unsigned int vlan_tci, struct sk_buff *skb)
{
        struct sk_buff *p;
        struct net_device *vlan_dev;
        u16 vlan_id;

        if (skb_bond_should_drop(skb, ACCESS_ONCE(skb->dev->master)))
                skb->deliver_no_wcard = 1;
 /* FROM_HERE */ 
        if (!is_vlan_dev(skb->dev)) {            <==== Check for vlan device 
              if (skb->vlan_tci & VLAN_VID_MASK)
                      skb->pkt_type = PACKET_OTHERHOST;
              return 0;
        } 
 /* TILL_HERE */

        skb->iif = skb->dev->ifindex;
        __vlan_hwaccel_put_tag(skb, vlan_tci);
        vlan_id = vlan_tci & VLAN_VID_MASK;
        vlan_dev = vlan_group_get_device(grp, vlan_id);

        if (vlan_dev)
                skb->dev = vlan_dev;
        else if (vlan_id)
                goto drop;

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.