kernel panics with the panic string "Kernel panic - not syncing: LBUG" caused by a third party kernel module 'lustre'
Environment
- RHEL 6.0
- Installed third-party module : lustre
Issue
- Server crash has occurred generating a vmcore where following logs were observed in the kernel ring buffer :
Kernel panic - not syncing: LBUG
Pid: 4093, comm: explore Not tainted 2.6.32-504.30.3.el6.x86_64 #1
Call Trace:
[<ffffffff815293fc>] ? panic+0xa7/0x16f
[<ffffffffa06f5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa0d94713>] ? ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
[<ffffffffa0d94e57>] ? ras_update+0x707/0xc10 [lustre]
[<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
[<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
[<ffffffffa0d956b8>] ? ll_readpage+0x358/0x1a30 [lustre]
[<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
[<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
[<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
[<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
[<ffffffff81125eac>] ? generic_file_aio_read+0x1fc/0x700
[<ffffffffa0dc5bbe>] ? vvp_io_read_start+0x22e/0x410 [lustre]
[<ffffffffa085b41a>] ? cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa085efb4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0d6646c>] ? ll_file_io_generic+0x1bc/0x7f0 [lustre]
[<ffffffffa0d76d48>] ? ll_file_aio_read+0x1c8/0x7b0 [lustre]
[<ffffffffa0d7746d>] ? ll_file_read+0x13d/0x270 [lustre]
[<ffffffff8118eba5>] ? vfs_read+0xb5/0x1a0
[<ffffffff8118ece1>] ? sys_read+0x51/0x90
[<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Resolution
- Red Hat does not have the source code of 'lustre' and it in not shipped by us. As a result of this, Red Hat has no visibility into how it operates. So it is advised to contact the vendor of the third-party kernel modules for further investigation and troubleshooting of the issue.
Root Cause
- In this case, the panic occurred with the message "Kernel panic - not syncing: LBUG". After inspecting the vmcore, it can be seen that the crash has occurred in a third party kernel module 'lustre'.
Diagnostic Steps
The Backtraces of the task running at the time panic
crash> bt
PID: 4093 TASK: ffff88236db89520 CPU: 14 COMMAND: "explore"
#0 [ffff881b0b9257a0] machine_kexec at ffffffff8103b60b
#1 [ffff881b0b925800] crash_kexec at ffffffff810c99e2
#2 [ffff881b0b9258d0] panic at ffffffff81529403
#3 [ffff881b0b925950] lbug_with_loc at ffffffffa06f5eeb [libcfs]
#4 [ffff881b0b925970] ras_stride_increase_window.clone.0 at ffffffffa0d94713 [lustre]
#5 [ffff881b0b9259e0] ras_update at ffffffffa0d94e57 [lustre]
#6 [ffff881b0b925a80] ll_readpage at ffffffffa0d956b8 [lustre]
#7 [ffff881b0b925bb0] generic_file_aio_read at ffffffff81125eac
#8 [ffff881b0b925c90] vvp_io_read_start at ffffffffa0dc5bbe [lustre]
#9 [ffff881b0b925d00] cl_io_start at ffffffffa085b41a [obdclass]
#10 [ffff881b0b925d30] cl_io_loop at ffffffffa085efb4 [obdclass]
#11 [ffff881b0b925d60] ll_file_io_generic at ffffffffa0d6646c [lustre]
#12 [ffff881b0b925e00] ll_file_aio_read at ffffffffa0d76d48 [lustre]
#13 [ffff881b0b925e80] ll_file_read at ffffffffa0d7746d [lustre]
#14 [ffff881b0b925ef0] vfs_read at ffffffff8118eba5
#15 [ffff881b0b925f30] sys_read at ffffffff8118ece1
#16 [ffff881b0b925f80] system_call_fastpath at ffffffff8100b0d2
RIP: 00002aaaaafad4c0 RSP: 00007fffffff7e20 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff8100b0d2 RCX: 00007fffffff8b30
RDX: 0000000000001fff RSI: 00000000018adf20 RDI: 0000000000000081
RBP: 00000000018adf20 R8: 0000000000000000 R9: 00002aaac1296010
R10: 00000000019ff5d2 R11: 0000000000000246 R12: 00000000018adac0
R13: 00000000018adb40 R14: 0000000000001fff R15: 0000000001c11df0
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
The kernel ring buffer logs are as given below
crash> log
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) ASSERTION( ras->ras_window_start + ras->ras_window_len >= ras->ras_stride_offset ) failed: window_start 96512, window_len 0 stride_offset 96549
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) LBUG
Pid: 4093, comm: explore
Call Trace:
[<ffffffffa06f5895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa06f5e97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0d94713>] ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
[<ffffffffa0d94e57>] ras_update+0x707/0xc10 [lustre]
[<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
[<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
[<ffffffffa0d956b8>] ll_readpage+0x358/0x1a30 [lustre]
[<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
[<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
[<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
[<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
[<ffffffff81125eac>] generic_file_aio_read+0x1fc/0x700
[<ffffffffa0dc5bbe>] vvp_io_read_start+0x22e/0x410 [lustre]
[<ffffffffa085b41a>] cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa085efb4>] cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0d6646c>] ll_file_io_generic+0x1bc/0x7f0 [lustre]
[<ffffffffa0d76d48>] ll_file_aio_read+0x1c8/0x7b0 [lustre]
[<ffffffffa0d7746d>] ll_file_read+0x13d/0x270 [lustre]
[<ffffffff8118eba5>] vfs_read+0xb5/0x1a0
[<ffffffff8118ece1>] sys_read+0x51/0x90
[<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Kernel panic - not syncing: LBUG
Pid: 4093, comm: explore Not tainted 2.6.32-504.30.3.el6.x86_64 #1
Call Trace:
[<ffffffff815293fc>] ? panic+0xa7/0x16f
[<ffffffffa06f5eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa0d94713>] ? ras_stride_increase_window.clone.0+0x1d3/0x210 [lustre]
[<ffffffffa0d94e57>] ? ras_update+0x707/0xc10 [lustre]
[<ffffffffa0dc3a41>] ? vvp_page_assume+0x11/0xa0 [lustre]
[<ffffffffa08555b8>] ? cl_page_invoid+0x68/0x160 [obdclass]
[<ffffffffa0d956b8>] ? ll_readpage+0x358/0x1a30 [lustre]
[<ffffffffa0cdde8a>] ? lov_stripe_size+0x1ba/0x250 [lov]
[<ffffffffa0cde1e4>] ? lov_merge_lvb_kms+0x124/0x530 [lov]
[<ffffffffa0c58ac9>] ? osc_lock_enqueue+0x2f9/0x910 [osc]
[<ffffffffa0c59d90>] ? osc_lock_upcall+0x0/0x550 [osc]
[<ffffffff81125eac>] ? generic_file_aio_read+0x1fc/0x700
[<ffffffffa0dc5bbe>] ? vvp_io_read_start+0x22e/0x410 [lustre]
[<ffffffffa085b41a>] ? cl_io_start+0x6a/0x140 [obdclass]
[<ffffffffa085efb4>] ? cl_io_loop+0xb4/0x1b0 [obdclass]
[<ffffffffa0d6646c>] ? ll_file_io_generic+0x1bc/0x7f0 [lustre]
[<ffffffffa0d76d48>] ? ll_file_aio_read+0x1c8/0x7b0 [lustre]
[<ffffffffa0d7746d>] ? ll_file_read+0x13d/0x270 [lustre]
[<ffffffff8118eba5>] ? vfs_read+0xb5/0x1a0
[<ffffffff8118ece1>] ? sys_read+0x51/0x90
[<ffffffff8152d7de>] ? do_device_not_available+0xe/0x10
[<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
Ring buffer also indicates errors related to a third party module 'lustre'
crash> log | grep -i 'LustreError'
LustreError: 15260:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 3222:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 5252:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 23420:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 1966:0:(mdc_locks.c:920:mdc_enqueue()) scratch1-MDT0000-mdc-ffff88121d0eec00: ldlm_cli_enqueue failed: rc = -2
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) ASSERTION( ras->ras_window_start + ras->ras_window_len >= ras->ras_stride_offset ) failed: window_start 96512, window_len 0 stride_offset 96549
LustreError: 4093:0:(rw.c:747:ras_stride_increase_window()) LBUG
List of proprietary (P) unsigned (U) module is as follows :
crash> mod -t | grep U
xvma (U)
xpmem (U)
numatools (U)
hwperf (U)
libcfs (U)
lnet (U)
obdclass (U)
ptlrpc (U)
ko2iblnd (U)
fid (U)
mdc (U)
osc (U)
lov (U)
lustre (U) --------->>>
mgc (U)
fld (U)
lmv (U)
Additional details of the proprietary module 'lustre'
crash> mod | grep -e NAME -e lustre
MODULE NAME SIZE OBJECT FILE
ffffffffa0dfa680 lustre 938007 (not loaded) [CONFIG_KALLSYMS]
crash> module.name,version,srcversion,gpgsig_ok ffffffffa0dfa680
name = "lustre\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
version = 0x0
srcversion = 0xffff88120f102920 "812A14CB8665B61656F76F6"
gpgsig_ok = 0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
