General Protection Fault or Hard LOCKUP occurs when `edac` module is unloaded in Red Hat Enterprise Linux 6/7
Issue
- Kernel panic with below logs:
[10734746.234065] EDAC MC: Removed device 0 for sbridge_edac.c Haswell Socket#0: DEV 0000:7f:12.0
[10734746.239171] EDAC MC: Removed device 1 for sbridge_edac.c Haswell Socket#1: DEV 0000:ff:12.0
[10734747.234430] general protection fault: 0000 [#1] SMP
[10734747.240317] last sysfs file: /sys/devices/system/cpu/online
[10734747.246853] CPU 29
[10734747.249302] Modules linked in: iptable_filter ip_tables ext3 xfs ext2 jbd ktap_106973(U) vxodm(P)(U) vxgms(P)(U) secvm2(P)(U) secfs2(P)(U) amf(P)(U) vxglm(P)(U) vxfen(P)(U) gab(P)(U) llt(P)(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfs lockd fscache auth_rpcgss nfs_acl sunrpc dmpaa(P)(U) vxspec(P)(U) vxio(P)(U) vxdmp(P)(U) bonding vxcafs(P)(U) vxportal(P)(U) fdd(P)(U) vxfs(P)(U) exportfs microcode iTCO_wdt iTCO_vendor_support lpc_ich mfd_core sg joydev power_meter acpi_ipmi ipmi_si ipmi_msghandler shpchp ext4 jbd2 mbcache dm_round_robin sd_mod crc_t10dif fnic(U) libfcoe libfc scsi_transport_fc scsi_tgt enic(U) crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edac_core]
[10734747.337378]
[10734747.339519] Pid: 0, comm: swapper Tainted: P W -- ------------ 2.6.32-754.35.1.el6.x86_64 #1 Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4
[10734747.355020] RIP: 0010:[<ffffffff8155decf>] [<ffffffff8155decf>] _spin_lock_irqsave+0x1f/0x40
[10734747.365238] RSP: 0018:ffff8801216c3df0 EFLAGS: 00010086
[10734747.371660] RAX: 0000000000010000 RBX: 6364f06480897467 RCX: 0000000000000000
[10734747.380315] RDX: 0000000000000286 RSI: ffff88405275c148 RDI: 6364f06480897467
[10734747.388964] RBP: ffff8801216c3df0 R08: ffff882052810440 R09: 00261d6d81d70c80
[10734747.397614] R10: 0000000000000001 R11: 00000000000000f5 R12: ffff88405275c168
[10734747.406262] R13: ffff88405275c148 R14: ffff88405275c148 R15: ffffffff810a5d90
[10734747.414910] FS: 0000000000000000(0000) GS:ffff8801216c0000(0000) knlGS:0000000000000000
[10734747.424612] CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
[10734747.431514] CR2: 00007f84d8112000 CR3: 0000000001a8e000 CR4: 00000000001607e0
[10734747.440159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10734747.448801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10734747.457441] Process swapper (pid: 0, threadinfo ffff88205283c000, task ffff88205280a040)
[10734747.467154] Stack:
[10734747.469876] ffff8801216c3e20 ffffffff810a5cf4 ffff88405275c168 ffff882052810000
[10734747.478476] <d> ffff88405275c168 ffff8801216c3e80 ffff8801216c3e30 ffffffff810a5dc9
[10734747.487782] <d> ffff8801216c3ec0 ffffffff81094fb9 ffffffff81094806 ffff882052811c20
[10734747.497466] Call Trace:
[10734747.500675] <IRQ>
[10734747.503517] [<ffffffff810a5cf4>] __queue_work+0x24/0x50
[10734747.509942] [<ffffffff810a5dc9>] delayed_work_timer_fn+0x39/0x50
[10734747.517244] [<ffffffff81094fb9>] run_timer_softirq+0x199/0x350
[10734747.524346] [<ffffffff81094806>] ? update_process_times+0x76/0x90
[10734747.531743] [<ffffffff8108aa6a>] __do_softirq+0xea/0x240
[10734747.538268] [<ffffffff8156791c>] call_softirq+0x1c/0x30
[10734747.544691] [<ffffffff8100e535>] do_softirq+0x65/0xa0
[10734747.550911] [<ffffffff8108a6fd>] irq_exit+0x8d/0xa0
[10734747.556946] [<ffffffff8156885e>] smp_apic_timer_interrupt+0x4e/0x60
[10734747.564547] [<ffffffff81567193>] apic_timer_interrupt+0x13/0x20
[10734747.571750] <EOI>
[10734747.574599] [<ffffffff81309c4e>] ? intel_idle+0x13e/0x260
[10734747.581223] [<ffffffff81309c31>] ? intel_idle+0x121/0x260
[10734747.587851] [<ffffffff814524ae>] cpuidle_idle_call+0x8e/0xf0
[10734747.594769] [<ffffffff8100a169>] cpu_idle+0xd9/0x180
[10734747.600906] [<ffffffff81553e68>] start_secondary+0x314/0x36a
[10734747.607818] Code: c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 c8 c1 e8 10 39 c1 74 0e f3 90 0f b7 0f eb f5
[10734747.630656] RIP [<ffffffff8155decf>] _spin_lock_irqsave+0x1f/0x40
[10734747.638080] RSP <ffff8801216c3df0>
- Another pattern of crash
EDAC MC: Removed device 0 for sbridge_edac.c Haswell Socket#0: DEV 0000:3f:12.0
EDAC MC: Removed device 1 for sbridge_edac.c Haswell Socket#1: DEV 0000:7f:12.0
EDAC MC: Removed device 2 for sbridge_edac.c Haswell Socket#2: DEV 0000:bf:12.0
EDAC MC: Removed device 3 for sbridge_edac.c Haswell Socket#3: DEV 0000:ff:12.0
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff812b9ef6>] __list_add+0x26/0xa0
Kernel PGD 0
User PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/online
CPU 113
Modules linked in: tcp_diag inet_diag mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc falcon_lsm_serviceable(P)(U) falcon_nf_netcontain(P)(U) falcon_kal(U) falcon_lsm_pinned_13003(U) bonding ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ocrdma ib_core ib_addr ipv6 dm_service_time dm_multipath microcode iTCO_wdt iTCO_vendor_support ipmi_devintf serio_raw lpfc scsi_dh_emc scsi_transport_fc scsi_tgt joydev lpc_ich mfd_core hpilo hpwdt power_meter acpi_ipmi ipmi_si ipmi_msghandler ioatdma dca tg3 ptp pps_core sg ext4 jbd2 mbcache dm_snapshot dm_bufio sd_mod crc_t10dif be2net hpsa wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edac_core]
Pid: 0, comm: swapper Tainted: P -- ------------ 2.6.32-754.41.2.el6.x86_64 #1 HP ProLiant DL580 Gen9/ProLiant DL580 Gen9
RIP: 0010:[<ffffffff812b9ef6>] [<ffffffff812b9ef6>] __list_add+0x26/0xa0
RSP: 0018:ffff898242643d90 EFLAGS: 00010046
RAX: ffff880272200021 RBX: ffff88ffca314150 RCX: 0000000000000000
RDX: ffff880272200028 RSI: 0000000000000000 RDI: ffff88ffca314150
RBP: ffff898242643db0 R08: 0000000000000000 R09: 00010598e67f9740
R10: 0000000000000000 R11: 0000000000000039 R12: ffff880272200028
R13: 0000000000000000 R14: ffff88ffca314148 R15: ffffffff810a5d90
FS: 0000000000000000(0000) GS:ffff898242640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001a8e000 CR4: 00000000001607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff887fce294000, task ffff887fce27e040)
Stack:
0000000000000071 ffff88ffca314148 ffff880272200020 0000000000000000
<d> ffff898242643df0 ffffffff810a5627 0000000000000082 ffff898242658c00
<d> ffff880272200020 0000000000000286 ffff88ffca314148 ffff88ffca314148
Call Trace:
<IRQ>
[<ffffffff810a5627>] insert_work+0x57/0xc0
[<ffffffff810a5d06>] __queue_work+0x36/0x50
[<ffffffff8106d954>] ? scheduler_tick+0x124/0x270
[<ffffffff810a5dc9>] delayed_work_timer_fn+0x39/0x50
[<ffffffff81094fb9>] run_timer_softirq+0x199/0x350
[<ffffffff81094806>] ? update_process_times+0x76/0x90
[<ffffffff8103f942>] ? native_apic_msr_write+0x32/0x40
[<ffffffff8108aa6a>] __do_softirq+0xea/0x240
[<ffffffff8156791c>] call_softirq+0x1c/0x30
[<ffffffff8100e535>] do_softirq+0x65/0xa0
[<ffffffff8108a6fd>] irq_exit+0x8d/0xa0
[<ffffffff8156885e>] smp_apic_timer_interrupt+0x4e/0x60
[<ffffffff81567193>] apic_timer_interrupt+0x13/0x20
<EOI>
[<ffffffff81309bae>] ? intel_idle+0x13e/0x260
[<ffffffff81309b91>] ? intel_idle+0x121/0x260
[<ffffffff8145240e>] cpuidle_idle_call+0x8e/0xf0
[<ffffffff8100a169>] cpu_idle+0xd9/0x180
[<ffffffff81553dd8>] start_secondary+0x314/0x36a
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 27 <4d> 8b 45 00 4d 39 c4 75 40 49 89 5c 24 08 4c 89 23 4c 89 6b 08
RIP [<ffffffff812b9ef6>] __list_add+0x26/0xa0
RSP <ffff898242643d90>
CR2: 0000000000000000
- Another pattern of crash
[3564457.969112] usb 1-1.6.2: USB disconnect, device number 74
[3564461.551423] EDAC MC: Removed device 0 for sb_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:7f:12.0
[3564461.578365] EDAC MC: Removed device 1 for sb_edac.c Broadwell SrcID#1_Ha#0: DEV 0000:ff:12.0
[3564461.592353] EDAC MC: Removed device 2 for sb_edac.c Broadwell SrcID#0_Ha#1: DEV 0000:7f:12.4
[3564461.606350] EDAC MC: Removed device 3 for sb_edac.c Broadwell SrcID#1_Ha#1: DEV 0000:ff:12.4
[3564461.862138] BUG: unable to handle kernel
[3564480.573256] NMI watchdog: Watchdog detected hard LOCKUP on cpu 19
[3564480.573256] Modules linked in:
[3564480.573258] iptable_filter
[3564480.573259] mpt2sas
**** Lines Trimmed ***
[3564480.573314] dm_mod
[3564480.573314] [last unloaded: sb_edac]
[3564480.573315]
[3564480.573318] CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Not tainted 3.10.0-1160.45.1.el7.x86_64 #1
[3564480.573319] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.11.0 11/02/2019
[3564480.573321] task: ffff91bb1f779080 ti: ffff91bb1f790000 task.ti: ffff91bb1f790000
[3564480.573322] RIP: 0010:[<ffffffff9f117b50>]
[3564480.573328] [<ffffffff9f117b50>] native_queued_spin_lock_slowpath+0x1d0/0x200
[3564480.573329] RSP: 0018:ffff91bb1f7938e0 EFLAGS: 00000002
[3564480.573330] RAX: 0000000002390101 RBX: 0000000000000082 RCX: 0000000000000001
[3564480.573331] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff91f97ee53940
[3564480.573332] RBP: ffff91bb1f7938e0 R08: 0000000000000101 R09: 0000000000000000
[3564480.573333] R10: 0000000000000000 R11: ffff91bb1f793966 R12: ffff91f97ee53940
[3564480.573334] R13: ffff91f9722e9cf0 R14: ffff91bb1f793950 R15: 0000000000000e20
[3564480.573336] FS: 0000000000000000(0000) GS:ffff91f97ee40000(0000) knlGS:0000000000000000
[3564480.573337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3564480.573338] CR2: 0000000000000018 CR3: 0000005d31dda000 CR4: 00000000003607e0
[3564480.573339] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3564480.573340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3564480.573341] Call Trace:
[3564480.573348] [<ffffffff9f77dcf3>] queued_spin_lock_slowpath+0xb/0xf
[3564480.573353] [<ffffffff9f78bb27>] _raw_spin_lock_irqsave+0x37/0x40
[3564480.573357] [<ffffffff9f0adbcb>] lock_timer_base.isra.38+0x2b/0x50
[3564480.573360] [<ffffffff9f0aed04>] mod_timer+0x84/0x230
[3564480.573362] [<ffffffff9f0aeec8>] add_timer+0x18/0x20
[3564480.573366] [<ffffffff9f404d79>] fbcon_add_cursor_timer+0x99/0xf0
[3564480.573368] [<ffffffff9f40815a>] fbcon_cursor+0xaa/0x1c0
[3564480.573372] [<ffffffff9f393f34>] ? vsnprintf+0x234/0x6a0
[3564480.573375] [<ffffffff9f47eaa3>] hide_cursor+0x33/0xa0
[3564480.573378] [<ffffffff9f480528>] vt_console_print+0x3e8/0x430
[3564480.573380] [<ffffffff9f394526>] ? sprintf+0x56/0x80
[3564480.573383] [<ffffffff9f09c513>] call_console_drivers.constprop.19+0x93/0xf0
[3564480.573385] [<ffffffff9f09dbeb>] console_unlock+0x33b/0x4b0
[3564480.573388] [<ffffffff9f0cbdad>] ? down_trylock+0x2d/0x40
[3564480.573390] [<ffffffff9f09e124>] vprintk_emit+0x3c4/0x510
[3564480.573392] [<ffffffff9f09e4d9>] vprintk_default+0x29/0x40
[3564480.573394] [<ffffffff9f77d3d8>] printk+0x60/0x77
[3564480.573399] [<ffffffff9f075de2>] no_context+0x212/0x300
[3564480.573401] [<ffffffff9f075fe2>] __bad_area_nosemaphore+0x112/0x220
[3564480.573403] [<ffffffff9f076104>] bad_area_nosemaphore+0x14/0x20
[3564480.573407] [<ffffffff9f790750>] __do_page_fault+0x310/0x500
[3564480.573409] [<ffffffff9f790975>] do_page_fault+0x35/0x90
[3564480.573411] [<ffffffff9f78c778>] page_fault+0x28/0x30
[3564480.573414] [<ffffffff9f0af088>] ? get_next_timer_interrupt+0x1b8/0x260
[3564480.573416] [<ffffffff9f0aef26>] ? get_next_timer_interrupt+0x56/0x260
[3564480.573420] [<ffffffff9f110a07>] tick_nohz_stop_sched_tick+0x1f7/0x390
[3564480.573422] [<ffffffff9f110c3f>] __tick_nohz_idle_enter+0x9f/0x170
[3564480.573424] [<ffffffff9f11118f>] tick_nohz_idle_enter+0x3f/0x70
[3564480.573428] [<ffffffff9f101777>] cpu_startup_entry+0xa7/0x1e0
[3564480.573433] [<ffffffff9f05a827>] start_secondary+0x1f7/0x270
[3564480.573436] [<ffffffff9f0000d5>] start_cpu+0x5/0x14
[3564480.573437] Code:
[3564480.573438] fe
**** Lines Trimmed ***
[3564480.573465] 0f
[3564480.573465] 1f
[3564480.573465]
[3564480.573466] Kernel panic - not syncing: Hard LOCKUP
Environment
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 6
- Cisco UCS Series
edacmodule.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.