kernel panics with the panic string "Kernel panic - not syncing: LBUG" caused by a third party kernel module 'lustre'

Solution Unverified - Updated -

Environment

  • RHEL 6.0
  • Installed third-party module : lustre

Issue

  • Server crash has occurred generating a vmcore where following logs were observed in the kernel ring buffer :
Kernel panic - not syncing: LBUG
Pid: 4093, comm: explore Not tainted 2.6.32-504.30.3.el6.x86_64 #1
Call Trace:
 [<ffffffff815293fc>] ? panic+0xa7/0x16f
 [<ffffffffa06f5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0d94713>] ? ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
 [<ffffffffa0d94e57>] ? ras_update+0x707/0xc10 [lustre]
 [<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
 [<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
 [<ffffffffa0d956b8>] ? ll_readpage+0x358/0x1a30 [lustre]
 [<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
 [<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
 [<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
 [<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
 [<ffffffff81125eac>] ? generic_file_aio_read+0x1fc/0x700
 [<ffffffffa0dc5bbe>] ? vvp_io_read_start+0x22e/0x410 [lustre]
 [<ffffffffa085b41a>] ? cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa085efb4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0d6646c>] ? ll_file_io_generic+0x1bc/0x7f0 [lustre]
 [<ffffffffa0d76d48>] ? ll_file_aio_read+0x1c8/0x7b0 [lustre]
 [<ffffffffa0d7746d>] ? ll_file_read+0x13d/0x270 [lustre]
 [<ffffffff8118eba5>] ? vfs_read+0xb5/0x1a0
 [<ffffffff8118ece1>] ? sys_read+0x51/0x90
 [<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

Resolution

  • Red Hat does not have the source code of 'lustre' and it in not shipped by us. As a result of this, Red Hat has no visibility into how it operates. So it is advised to contact the vendor of the third-party kernel modules for further investigation and troubleshooting of the issue.

Root Cause

  • In this case, the panic occurred with the message "Kernel panic - not syncing: LBUG". After inspecting the vmcore, it can be seen that the crash has occurred in a third party kernel module 'lustre'.

Diagnostic Steps

The Backtraces of the task running at the time panic

crash> bt

PID: 4093   TASK: ffff88236db89520  CPU: 14  COMMAND: "explore"
 #0 [ffff881b0b9257a0] machine_kexec at ffffffff8103b60b
 #1 [ffff881b0b925800] crash_kexec at ffffffff810c99e2
 #2 [ffff881b0b9258d0] panic at ffffffff81529403
 #3 [ffff881b0b925950] lbug_with_loc at ffffffffa06f5eeb [libcfs]
 #4 [ffff881b0b925970] ras_stride_increase_window.clone.0 at ffffffffa0d94713 [lustre]
 #5 [ffff881b0b9259e0] ras_update at ffffffffa0d94e57 [lustre]
 #6 [ffff881b0b925a80] ll_readpage at ffffffffa0d956b8 [lustre]
 #7 [ffff881b0b925bb0] generic_file_aio_read at ffffffff81125eac
 #8 [ffff881b0b925c90] vvp_io_read_start at ffffffffa0dc5bbe [lustre]
 #9 [ffff881b0b925d00] cl_io_start at ffffffffa085b41a [obdclass]
#10 [ffff881b0b925d30] cl_io_loop at ffffffffa085efb4 [obdclass]
#11 [ffff881b0b925d60] ll_file_io_generic at ffffffffa0d6646c [lustre]
#12 [ffff881b0b925e00] ll_file_aio_read at ffffffffa0d76d48 [lustre]
#13 [ffff881b0b925e80] ll_file_read at ffffffffa0d7746d [lustre]
#14 [ffff881b0b925ef0] vfs_read at ffffffff8118eba5
#15 [ffff881b0b925f30] sys_read at ffffffff8118ece1
#16 [ffff881b0b925f80] system_call_fastpath at ffffffff8100b0d2
    RIP: 00002aaaaafad4c0  RSP: 00007fffffff7e20  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffffffff8100b0d2  RCX: 00007fffffff8b30
    RDX: 0000000000001fff  RSI: 00000000018adf20  RDI: 0000000000000081
    RBP: 00000000018adf20   R8: 0000000000000000   R9: 00002aaac1296010
    R10: 00000000019ff5d2  R11: 0000000000000246  R12: 00000000018adac0
    R13: 00000000018adb40  R14: 0000000000001fff  R15: 0000000001c11df0
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

The kernel ring buffer logs are as given below

crash> log

LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) ASSERTION( ras->ras_window_start + ras->ras_window_len >= ras->ras_stride_offset ) failed: window_start 96512, window_len 0 stride_offset 96549
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) LBUG
Pid: 4093, comm: explore

Call Trace:
 [<ffffffffa06f5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [<ffffffffa06f5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
 [<ffffffffa0d94713>] ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
 [<ffffffffa0d94e57>] ras_update+0x707/0xc10 [lustre]
 [<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
 [<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
 [<ffffffffa0d956b8>] ll_readpage+0x358/0x1a30 [lustre]
 [<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
 [<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
 [<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
 [<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
 [<ffffffff81125eac>] generic_file_aio_read+0x1fc/0x700
 [<ffffffffa0dc5bbe>] vvp_io_read_start+0x22e/0x410 [lustre]
 [<ffffffffa085b41a>] cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa085efb4>] cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0d6646c>] ll_file_io_generic+0x1bc/0x7f0 [lustre]
 [<ffffffffa0d76d48>] ll_file_aio_read+0x1c8/0x7b0 [lustre]
 [<ffffffffa0d7746d>] ll_file_read+0x13d/0x270 [lustre]
 [<ffffffff8118eba5>] vfs_read+0xb5/0x1a0
 [<ffffffff8118ece1>] sys_read+0x51/0x90
 [<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

Kernel panic - not syncing: LBUG
Pid: 4093, comm: explore Not tainted 2.6.32-504.30.3.el6.x86_64 #1
Call Trace:
 [<ffffffff815293fc>] ? panic+0xa7/0x16f
 [<ffffffffa06f5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
 [<ffffffffa0d94713>] ? ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
 [<ffffffffa0d94e57>] ? ras_update+0x707/0xc10 [lustre]
 [<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
 [<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
 [<ffffffffa0d956b8>] ? ll_readpage+0x358/0x1a30 [lustre]
 [<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
 [<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
 [<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
 [<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
 [<ffffffff81125eac>] ? generic_file_aio_read+0x1fc/0x700
 [<ffffffffa0dc5bbe>] ? vvp_io_read_start+0x22e/0x410 [lustre]
 [<ffffffffa085b41a>] ? cl_io_start+0x6a/0x140 [obdclass]
 [<ffffffffa085efb4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
 [<ffffffffa0d6646c>] ? ll_file_io_generic+0x1bc/0x7f0 [lustre]
 [<ffffffffa0d76d48>] ? ll_file_aio_read+0x1c8/0x7b0 [lustre]
 [<ffffffffa0d7746d>] ? ll_file_read+0x13d/0x270 [lustre]
 [<ffffffff8118eba5>] ? vfs_read+0xb5/0x1a0
 [<ffffffff8118ece1>] ? sys_read+0x51/0x90
 [<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b

Ring buffer also indicates errors related to a third party module 'lustre'

crash> log | grep -i 'LustreError'

LustreError: 15260:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 3222:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 5252:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 23420:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 1966:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) ASSERTION( ras->ras_window_start + ras->ras_window_len >= ras->ras_stride_offset ) failed: window_start 96512, window_len 0 stride_offset 96549
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) LBUG

List of proprietary (P) unsigned (U) module is as follows :

crash> mod -t | grep U

xvma       (U)
xpmem      (U)
numatools  (U)
hwperf     (U)
libcfs     (U)
lnet       (U)
obdclass   (U)
ptlrpc     (U)
ko2iblnd   (U)
fid        (U)
mdc        (U)
osc        (U)
lov        (U)
lustre     (U)     --------->>>
mgc        (U)
fld        (U)
lmv        (U)

Additional details of the proprietary module 'lustre'

crash> mod | grep -e NAME -e lustre

     MODULE       NAME                    SIZE  OBJECT FILE
ffffffffa0dfa680  lustre                938007  (not loaded)  [CONFIG_KALLSYMS]


crash> module.name,version,srcversion,gpgsig_ok ffffffffa0dfa680

  name = "lustre\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  version = 0x0
  srcversion = 0xffff88120f102920 "812A14CB8665B61656F76F6"
  gpgsig_ok = 0

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.