kernel panics with the panic string "Kernel BUG at ...uyang/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmconvert.c:505" caused by a third party kernel module 'ocfs2_dlm'
Environment
- RHEL 5
- Installed third-party module : ocfs2_dlm
Issue
- Server crash has occurred generating a vmcore where following logs were observed in the kernel ring buffer :
Kernel BUG at ...uyang/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmconvert.c:505
invalid opcode: 0000 [1] SMP
last sysfs file: /block/cciss!c0d0/cciss!c0d0p1/stat
CPU 7
Modules linked in: oracleacfs(PU) oracleadvm(PU) oracleoks(PU) nfs nfs_acl mptctl mptbase autofs4 i2c_dev i2c_core ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6 xfrm_nalgo crypto_api dm_round_robin dm_multipath scsi_dh parport_pc lp parport joydev sg shpchp tpm_tis i7core_edac tpm edac_mc e1000e hpilo tpm_bios bnx2 serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11904, comm: o2net Tainted: P ---- 2.6.18-274.12.1.el5 #1
RIP: 0010:[<ffffffff884ddaa2>] [<ffffffff884ddaa2>] :ocfs2_dlm:dlm_convert_lock_handler+0x5d8/0x7da
RSP: 0018:ffff81043d47dd50 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000028 RCX: ffff8115f0e44c28
RDX: ffff8102d184cb50 RSI: ffff81007608f047 RDI: ffff8102d184cb8c
RBP: 0000000040000500 R08: ffff81240c420a58 R09: 000000000002fa58
R10: 00000000000004a0 R11: ffff81043d47dda0 R12: ffff8115189cca40
R13: ffff8102d184cb40 R14: ffff81007608f018 R15: ffff8115f0e44c00
FS: 0000000000000000(0000) GS:ffff8112201483c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000006d9718a CR3: 00000015a182d000 CR4: 00000000000006a0
Process o2net (pid: 11904, threadinfo ffff81043d47c000, task ffff81241a71a820)
Stack: ffff81122016dc00 ffff810403fbf400 0000000000000000 ffff81241c4f2ec0
ffff81007608f000 0000000000000000 ffff810403fbf400 0000000000000000
ffffffff884a7db6 ffffffff884a6552 0000000000000000 0000000000000000
Call Trace:
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff884a6552>] :ocfs2_nodemanager:o2net_process_message+0x32f/0x507
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff884a8649>] :ocfs2_nodemanager:o2net_rx_until_empty+0x893/0xa36
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
[<ffffffff80049b3d>] worker_thread+0x0/0x122
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049c2d>] worker_thread+0xf0/0x122
[<ffffffff8008e880>] default_wake_function+0x0/0xe
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003270f>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032611>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 0f 0b 68 01 c3 4f 88 c2 f9 01 eb fe 83 c8 04 41 89 44 24 04
RIP [<ffffffff884ddaa2>] :ocfs2_dlm:dlm_convert_lock_handler+0x5d8/0x7da
RSP <ffff81043d47dd50>
Resolution
- Red Hat does not have the source code of 'ocfs2_dlm' and it in not shipped by us. As a result of this, Red Hat has no visibility into how it operates. So it is advised to contact the vendor of the third-party kernel modules for further investigation and troubleshooting of the issue.
Root Cause
- In this case, the panic occurred with the message "Kernel BUG at ...uyang/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmconvert.c:505". After inspecting the vmcore, it can be seen that the crash has occurred in a third party kernel module 'ocfs2_dlm'.
Diagnostic Steps
- The analysis of the vmcore indicates the panic has occurred in a function called 'dlm_convert_lock_handler' of a third party module 'ocfs2_dlm'.
The Backtraces of the task running at the time panic
crash> bt
PID: 11904 TASK: ffff81241a71a820 CPU: 7 COMMAND: "o2net"
#0 [ffff81043d47dab0] crash_kexec at ffffffff800afee7
#1 [ffff81043d47db70] __die at ffffffff80065127
#2 [ffff81043d47dbb0] die at ffffffff8006c779
#3 [ffff81043d47dbe0] do_invalid_op at ffffffff8006cd39
#4 [ffff81043d47dca0] error_exit at ffffffff8005dde9
[exception RIP: dlm_convert_lock_handler+1496]
RIP: ffffffff884ddaa2 RSP: ffff81043d47dd50 RFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000028 RCX: ffff8115f0e44c28
RDX: ffff8102d184cb50 RSI: ffff81007608f047 RDI: ffff8102d184cb8c
RBP: 0000000040000500 R8: ffff81240c420a58 R9: 000000000002fa58
R10: 00000000000004a0 R11: ffff81043d47dda0 R12: ffff8115189cca40
R13: ffff8102d184cb40 R14: ffff81007608f018 R15: ffff8115f0e44c00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff81043d47ddf8] o2net_rx_until_empty at ffffffff884a8649 [ocfs2_nodemanager]
#6 [ffff81043d47df48] kernel_thread at ffffffff8005dfb1
The kernel ring buffer logs are as given below
crash> log
Kernel BUG at ...uyang/BUILD/ocfs2-1.4.10/fs/ocfs2/dlm/dlmconvert.c:505
invalid opcode: 0000 [1] SMP
last sysfs file: /block/cciss!c0d0/cciss!c0d0p1/stat
CPU 7
Modules linked in: oracleacfs(PU) oracleadvm(PU) oracleoks(PU) nfs nfs_acl mptctl mptbase autofs4 i2c_dev i2c_core ocfs2(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bonding ipv6 xfrm_nalgo crypto_api dm_round_robin dm_multipath scsi_dh parport_pc lp parport joydev sg shpchp tpm_tis i7core_edac tpm edac_mc e1000e hpilo tpm_bios bnx2 serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 11904, comm: o2net Tainted: P ---- 2.6.18-274.12.1.el5 #1
RIP: 0010:[<ffffffff884ddaa2>] [<ffffffff884ddaa2>] :ocfs2_dlm:dlm_convert_lock_handler+0x5d8/0x7da
RSP: 0018:ffff81043d47dd50 EFLAGS: 00010202
RAX: 0000000000000004 RBX: 0000000000000028 RCX: ffff8115f0e44c28
RDX: ffff8102d184cb50 RSI: ffff81007608f047 RDI: ffff8102d184cb8c
RBP: 0000000040000500 R08: ffff81240c420a58 R09: 000000000002fa58
R10: 00000000000004a0 R11: ffff81043d47dda0 R12: ffff8115189cca40
R13: ffff8102d184cb40 R14: ffff81007608f018 R15: ffff8115f0e44c00
FS: 0000000000000000(0000) GS:ffff8112201483c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000006d9718a CR3: 00000015a182d000 CR4: 00000000000006a0
Process o2net (pid: 11904, threadinfo ffff81043d47c000, task ffff81241a71a820)
Stack: ffff81122016dc00 ffff810403fbf400 0000000000000000 ffff81241c4f2ec0
ffff81007608f000 0000000000000000 ffff810403fbf400 0000000000000000
ffffffff884a7db6 ffffffff884a6552 0000000000000000 0000000000000000
Call Trace:
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff884a6552>] :ocfs2_nodemanager:o2net_process_message+0x32f/0x507
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff884a8649>] :ocfs2_nodemanager:o2net_rx_until_empty+0x893/0xa36
[<ffffffff884a7db6>] :ocfs2_nodemanager:o2net_rx_until_empty+0x0/0xa36
[<ffffffff8004d32e>] run_workqueue+0x9e/0xfb
[<ffffffff80049b3d>] worker_thread+0x0/0x122
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049c2d>] worker_thread+0xf0/0x122
[<ffffffff8008e880>] default_wake_function+0x0/0xe
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff8003270f>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032611>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 0f 0b 68 01 c3 4f 88 c2 f9 01 eb fe 83 c8 04 41 89 44 24 04
RIP [<ffffffff884ddaa2>] :ocfs2_dlm:dlm_convert_lock_handler+0x5d8/0x7da
RSP <ffff81043d47dd50>
List of proprietary (P) unsigned (U) module is as follows :
crash> mod -t | grep U
ocfs2_nodemanager 40(U)
ocfs2_dlm 40(U) ----->>>
ocfs2_dlmfs 40(U)
ocfs2 40(U)
oracleoks 41(U)
oracleadvm 41(U)
oracleacfs 41(U)
Additional details of the proprietary module 'ocfs2_dlm'
crash> mod | grep -e NAME -e ocfs2_dlm
MODULE NAME SIZE OBJECT FILE
ffffffff8850c780 ocfs2_dlm 235944 (not loaded) [CONFIG_KALLSYMS]
crash> module.name,version,srcversion,gpgsig_ok ffffffff8850c780
name = "ocfs2_dlm\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
version = 0xffff811202868d40 "1.4.10"
srcversion = 0xffff811202868520 "D2F019D4471AB90EBD14436"
gpgsig_ok = 0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
