NFS kernel crash with RHEL 6.3

Latest response

Hi there,

I also opened a case on the issue but writing to here also...On HP DL580 G7 server we got a crash as below:

WARNING: active task ffff887fec4bc040 on cpu 50: corrupt cpu value: 3708346440

     KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.el6.x86_64/vmlinux
   DUMPFILE: /var/crash/127.0.0.1-2013-02-08-18:11:42/vmcore  [PARTIAL DUMP]
       CPUS: 80
       DATE: Fri Feb  8 18:07:53 2013
     UPTIME: 2 days, 06:46:03
LOAD AVERAGE: 13.30, 12.83, 11.87
      TASKS: 2473
   NODENAME: ABC.XXX.net
    RELEASE: 2.6.32-279.el6.x86_64
    VERSION: #1 SMP Wed Jun 13 18:24:36 EDT 2012
    MACHINE: x86_64  (2264 Mhz)
     MEMORY: 512 GB
      PANIC: "Oops: 0000 [#1] SMP " (check log for details)
        PID: 78138
    COMMAND: "nfsd"
       TASK: ffff887fec4bc040  [THREAD_INFO: ffff887fdd08e000]
        CPU: 50
      STATE: TASK_UNINTERRUPTIBLE (PANIC)

crash> bt
PID: 78138  TASK: ffff887fec4bc040  CPU: 50  COMMAND: "nfsd"
#0 [ffff8820b0e87be0] machine_kexec at ffffffff8103281b
#1 [ffff8820b0e87c40] crash_kexec at ffffffff810ba662
#2 [ffff8820b0e87d10] oops_end at ffffffff81501290
#3 [ffff8820b0e87d40] die at ffffffff8100f26b
#4 [ffff8820b0e87d70] do_trap at ffffffff81500b84
#5 [ffff8820b0e87dd0] do_invalid_op at ffffffff8100ce35
#6 [ffff8820b0e87e70] invalid_op at ffffffff8100bedb
   [exception RIP: do_nmi+554]
   RIP: ffffffff8150105a  RSP: ffff8820b0e87f28  RFLAGS: 00010002
   RAX: ffff887fdd08ffd8  RBX: ffff8820b0e87f58  RCX: 00000000c0000101
   RDX: 00000000ffff8820  RSI: ffffffffffffffff  RDI: ffff8820b0e87f58
   RBP: ffff8820b0e87f48   R8: 0000000000000000   R9: 0000000000000004
   R10: ffffffff8163c9e0  R11: ffff881ff074c83f  R12: 0000000000000e30
   R13: ffff881ff0758b12  R14: ffff881ff0758c12  R15: ffff881ff0758c12
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#7 [ffff8820b0e87f50] nmi at ffffffff815008b0
   [exception RIP: fbcon_redraw+129]
   RIP: ffffffff812c01a1  RSP: ffff887fdd08da98  RFLAGS: 00000086
   RAX: 0000000000000001  RBX: ffff881ff0758c12  RCX: 0000000000000008
   RDX: 0000000000000007  RSI: ffff881ff0758c12  RDI: 0000000000000000
   RBP: ffff887fdd08daf8   R8: 0000000000000000   R9: 0000000000000004
   R10: ffffffff8163c9e0  R11: ffff881ff074c83f  R12: 0000000000000e30
   R13: ffff881ff0758b12  R14: ffff881ff0758c12  R15: ffff881ff0758c12
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#8 [ffff887fdd08da98] fbcon_redraw at ffffffff812c01a1
bt: cannot transition from exception stack to current process stack:
   exception stack pointer: ffff8820b0e87be0
     process stack pointer: ffff887fdd08db00
        current stack base: ffff887fdd08e000
crash>"

I think the problematic function is fbcon_redraw, but it is not related with NFS. How can I further debug the issue as I cannot send vmcore to the support for security reasons...

Thanks,

Responses

Hello,

I have located your support case and will respond more fully there but wanted to post a couple of observations here.

WARNING: active task ffff887fec4bc040 on cpu 50: corrupt cpu value: 3708346440

The cpu value this line refers to is stored in the thread_info structure so I suspect you will find this is corrupt. Often this can be caused by a stack overrun as the thread_info struct is stored at the end of the processes stack. The overrun can come from the process itself or from a completely unrelated process. I'll give you more details in your support case as I think that is probably a more appropriate place to continue this discussion for now.

Kindest regards,

Brad

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.