Codenomicon test causes kernel panic

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5

Issue

  • Kernel panics during an IP stress test or high network traffic load.
  • Trace similar to the following:

PID: 13966 TASK: ffff8102245f2100 CPU: 4 COMMAND: "cm_mon"
#0 [ffff81022417d880] crash_kexec at ffffffff800af897
#1 [ffff81022417d940] __die at ffffffff80065117
#2 [ffff81022417d980] do_page_fault at ffffffff8006748d
#3 [ffff81022417da70] error_exit at ffffffff8005dde9
[exception RIP: skb_dequeue+44]
RIP: ffffffff800478ac RSP: ffff81022417db28 RFLAGS: 00010007
RAX: 0000000000000000 RBX: ffff810226638c18 RCX: ffff81022417dc14
RDX: ffff81022417dbdc RSI: 0000000000000246 RDI: ffff810226638c2c
RBP: ffff81021d4e5ac0 R8: 0000000000000000 R9: 0000000000000000
R10: ffffffff88423360 R11: 0000000000000293 R12: ffff810226638c2c
R13: ffff810226638c18 R14: ffff81022417dc14 R15: 7fffffffffffffff
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff81022417db40] __skb_recv_datagram at ffffffff80230fd4
#5 [ffff81022417dbc0] skb_recv_datagram at ffffffff800559de
#6 [ffff81022417dbe0] rawv6_recvmsg at ffffffff883eec6e
#7 [ffff81022417dc50] sock_common_recvmsg at ffffffff80031ea0
#8 [ffff81022417dc80] sock_recvmsg at ffffffff800307ac
#9 [ffff81022417de30] sys_recvfrom at ffffffff8002bb13
#10 [ffff81022417df50] compat_sys_socketcall at ffffffff80241a55
#11 [ffff81022417df80] cstar_do_call at ffffffff8006160c
RIP: 00000000ffffe405 RSP: 00000000f7c0c150 RFLAGS: 00000293
RAX: 0000000000000066 RBX: ffffffff8006160c RCX: 00000000f7c0c160
RDX: 0000000000040000 RSI: 0000000000000000 RDI: 00000000f7c0c280
RBP: 000000000000000c R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00000000f7c0b74c
R13: f7c0c1ccf7c0c1d0 R14: 0000000000001000 R15: f7c0c28000000012
ORIG_RAX: 0000000000000066 CS: 0023 SS: 002b

Resolution

  • Bug 743375 has been filed. Per BZ, issue was fixed on errata RHSA-2012-0480

Diagnostic Steps

  • For the initial core

The "skb" variable value is in %r15, set from the "pskb" value passed into the routine in %rsi

/usr/src/debug/kernel-2.6.18/linux-2.6.18.x86_64/security/selinux/hooks.c: 3989
0xffffffff8012feb6 <selinux_ip_postroute_last+0x1f>:    mov    (%rsi),%r15

crash> sk_buff ffff81020d9c8880
struct sk_buff {
  next = 0x0,
  prev = 0x0,
  sk = 0xffff8102276fb040,
...

crash> sock 0xffff8102276fb040
...
  sk_security = 0x0,
...

  • Subsequent cores have displayed different results. All of them look related to corruption in or near skbs or related structures.
  • Asking customer to try older kernels to find a bisection starting point.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.