General Protection Fault or Hard LOCKUP occurs when `edac` module is unloaded in Red Hat Enterprise Linux 6/7

Solution In Progress - Updated -

Issue

  • Kernel panic with below logs:
[10734746.234065] EDAC MC: Removed device 0 for sbridge_edac.c Haswell Socket#0: DEV 0000:7f:12.0
[10734746.239171] EDAC MC: Removed device 1 for sbridge_edac.c Haswell Socket#1: DEV 0000:ff:12.0
[10734747.234430] general protection fault: 0000 [#1] SMP 
[10734747.240317] last sysfs file: /sys/devices/system/cpu/online
[10734747.246853] CPU 29 
[10734747.249302] Modules linked in: iptable_filter ip_tables ext3 xfs ext2 jbd ktap_106973(U) vxodm(P)(U) vxgms(P)(U) secvm2(P)(U) secfs2(P)(U) amf(P)(U) vxglm(P)(U) vxfen(P)(U) gab(P)(U) llt(P)(U) rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr nfs lockd fscache auth_rpcgss nfs_acl sunrpc dmpaa(P)(U) vxspec(P)(U) vxio(P)(U) vxdmp(P)(U) bonding vxcafs(P)(U) vxportal(P)(U) fdd(P)(U) vxfs(P)(U) exportfs microcode iTCO_wdt iTCO_vendor_support lpc_ich mfd_core sg joydev power_meter acpi_ipmi ipmi_si ipmi_msghandler shpchp ext4 jbd2 mbcache dm_round_robin sd_mod crc_t10dif fnic(U) libfcoe libfc scsi_transport_fc scsi_tgt enic(U) crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edac_core]
[10734747.337378] 
[10734747.339519] Pid: 0, comm: swapper Tainted: P        W  -- ------------    2.6.32-754.35.1.el6.x86_64 #1 Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4
[10734747.355020] RIP: 0010:[<ffffffff8155decf>]  [<ffffffff8155decf>] _spin_lock_irqsave+0x1f/0x40
[10734747.365238] RSP: 0018:ffff8801216c3df0  EFLAGS: 00010086
[10734747.371660] RAX: 0000000000010000 RBX: 6364f06480897467 RCX: 0000000000000000
[10734747.380315] RDX: 0000000000000286 RSI: ffff88405275c148 RDI: 6364f06480897467
[10734747.388964] RBP: ffff8801216c3df0 R08: ffff882052810440 R09: 00261d6d81d70c80
[10734747.397614] R10: 0000000000000001 R11: 00000000000000f5 R12: ffff88405275c168
[10734747.406262] R13: ffff88405275c148 R14: ffff88405275c148 R15: ffffffff810a5d90
[10734747.414910] FS:  0000000000000000(0000) GS:ffff8801216c0000(0000) knlGS:0000000000000000
[10734747.424612] CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
[10734747.431514] CR2: 00007f84d8112000 CR3: 0000000001a8e000 CR4: 00000000001607e0
[10734747.440159] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[10734747.448801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[10734747.457441] Process swapper (pid: 0, threadinfo ffff88205283c000, task ffff88205280a040)
[10734747.467154] Stack:
[10734747.469876]  ffff8801216c3e20 ffffffff810a5cf4 ffff88405275c168 ffff882052810000
[10734747.478476] <d> ffff88405275c168 ffff8801216c3e80 ffff8801216c3e30 ffffffff810a5dc9
[10734747.487782] <d> ffff8801216c3ec0 ffffffff81094fb9 ffffffff81094806 ffff882052811c20
[10734747.497466] Call Trace:
[10734747.500675]  <IRQ> 
[10734747.503517]  [<ffffffff810a5cf4>] __queue_work+0x24/0x50
[10734747.509942]  [<ffffffff810a5dc9>] delayed_work_timer_fn+0x39/0x50
[10734747.517244]  [<ffffffff81094fb9>] run_timer_softirq+0x199/0x350
[10734747.524346]  [<ffffffff81094806>] ? update_process_times+0x76/0x90
[10734747.531743]  [<ffffffff8108aa6a>] __do_softirq+0xea/0x240
[10734747.538268]  [<ffffffff8156791c>] call_softirq+0x1c/0x30
[10734747.544691]  [<ffffffff8100e535>] do_softirq+0x65/0xa0
[10734747.550911]  [<ffffffff8108a6fd>] irq_exit+0x8d/0xa0
[10734747.556946]  [<ffffffff8156885e>] smp_apic_timer_interrupt+0x4e/0x60
[10734747.564547]  [<ffffffff81567193>] apic_timer_interrupt+0x13/0x20
[10734747.571750]  <EOI> 
[10734747.574599]  [<ffffffff81309c4e>] ? intel_idle+0x13e/0x260
[10734747.581223]  [<ffffffff81309c31>] ? intel_idle+0x121/0x260
[10734747.587851]  [<ffffffff814524ae>] cpuidle_idle_call+0x8e/0xf0
[10734747.594769]  [<ffffffff8100a169>] cpu_idle+0xd9/0x180
[10734747.600906]  [<ffffffff81553e68>] start_secondary+0x314/0x36a
[10734747.607818] Code: c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 9c 58 0f 1f 44 00 00 48 89 c2 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 c8 c1 e8 10 39 c1 74 0e f3 90 0f b7 0f eb f5 
[10734747.630656] RIP  [<ffffffff8155decf>] _spin_lock_irqsave+0x1f/0x40
[10734747.638080]  RSP <ffff8801216c3df0>
  • Another pattern of crash
EDAC MC: Removed device 0 for sbridge_edac.c Haswell Socket#0: DEV 0000:3f:12.0
EDAC MC: Removed device 1 for sbridge_edac.c Haswell Socket#1: DEV 0000:7f:12.0
EDAC MC: Removed device 2 for sbridge_edac.c Haswell Socket#2: DEV 0000:bf:12.0
EDAC MC: Removed device 3 for sbridge_edac.c Haswell Socket#3: DEV 0000:ff:12.0
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff812b9ef6>] __list_add+0x26/0xa0
Kernel PGD 0 
User   PGD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/online
CPU 113 
Modules linked in: tcp_diag inet_diag mptctl mptbase autofs4 nfs lockd fscache auth_rpcgss nfs_acl sunrpc falcon_lsm_serviceable(P)(U) falcon_nf_netcontain(P)(U) falcon_kal(U) falcon_lsm_pinned_13003(U) bonding ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ocrdma ib_core ib_addr ipv6 dm_service_time dm_multipath microcode iTCO_wdt iTCO_vendor_support ipmi_devintf serio_raw lpfc scsi_dh_emc scsi_transport_fc scsi_tgt joydev lpc_ich mfd_core hpilo hpwdt power_meter acpi_ipmi ipmi_si ipmi_msghandler ioatdma dca tg3 ptp pps_core sg ext4 jbd2 mbcache dm_snapshot dm_bufio sd_mod crc_t10dif be2net hpsa wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: edac_core]

Pid: 0, comm: swapper Tainted: P           -- ------------    2.6.32-754.41.2.el6.x86_64 #1 HP ProLiant DL580 Gen9/ProLiant DL580 Gen9
RIP: 0010:[<ffffffff812b9ef6>]  [<ffffffff812b9ef6>] __list_add+0x26/0xa0
RSP: 0018:ffff898242643d90  EFLAGS: 00010046
RAX: ffff880272200021 RBX: ffff88ffca314150 RCX: 0000000000000000
RDX: ffff880272200028 RSI: 0000000000000000 RDI: ffff88ffca314150
RBP: ffff898242643db0 R08: 0000000000000000 R09: 00010598e67f9740
R10: 0000000000000000 R11: 0000000000000039 R12: ffff880272200028
R13: 0000000000000000 R14: ffff88ffca314148 R15: ffffffff810a5d90
FS:  0000000000000000(0000) GS:ffff898242640000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000001a8e000 CR4: 00000000001607e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff887fce294000, task ffff887fce27e040)
Stack:
 0000000000000071 ffff88ffca314148 ffff880272200020 0000000000000000
<d> ffff898242643df0 ffffffff810a5627 0000000000000082 ffff898242658c00
<d> ffff880272200020 0000000000000286 ffff88ffca314148 ffff88ffca314148
Call Trace:
 <IRQ> 
 [<ffffffff810a5627>] insert_work+0x57/0xc0
 [<ffffffff810a5d06>] __queue_work+0x36/0x50
 [<ffffffff8106d954>] ? scheduler_tick+0x124/0x270
 [<ffffffff810a5dc9>] delayed_work_timer_fn+0x39/0x50
 [<ffffffff81094fb9>] run_timer_softirq+0x199/0x350
 [<ffffffff81094806>] ? update_process_times+0x76/0x90
 [<ffffffff8103f942>] ? native_apic_msr_write+0x32/0x40
 [<ffffffff8108aa6a>] __do_softirq+0xea/0x240
 [<ffffffff8156791c>] call_softirq+0x1c/0x30
 [<ffffffff8100e535>] do_softirq+0x65/0xa0
 [<ffffffff8108a6fd>] irq_exit+0x8d/0xa0
 [<ffffffff8156885e>] smp_apic_timer_interrupt+0x4e/0x60
 [<ffffffff81567193>] apic_timer_interrupt+0x13/0x20
 <EOI> 
 [<ffffffff81309bae>] ? intel_idle+0x13e/0x260
 [<ffffffff81309b91>] ? intel_idle+0x121/0x260
 [<ffffffff8145240e>] cpuidle_idle_call+0x8e/0xf0
 [<ffffffff8100a169>] cpu_idle+0xd9/0x180
 [<ffffffff81553dd8>] start_secondary+0x314/0x36a
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 27 <4d> 8b 45 00 4d 39 c4 75 40 49 89 5c 24 08 4c 89 23 4c 89 6b 08 
RIP  [<ffffffff812b9ef6>] __list_add+0x26/0xa0
 RSP <ffff898242643d90>
CR2: 0000000000000000
  • Another pattern of crash
[3564457.969112] usb 1-1.6.2: USB disconnect, device number 74
[3564461.551423] EDAC MC: Removed device 0 for sb_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:7f:12.0
[3564461.578365] EDAC MC: Removed device 1 for sb_edac.c Broadwell SrcID#1_Ha#0: DEV 0000:ff:12.0
[3564461.592353] EDAC MC: Removed device 2 for sb_edac.c Broadwell SrcID#0_Ha#1: DEV 0000:7f:12.4
[3564461.606350] EDAC MC: Removed device 3 for sb_edac.c Broadwell SrcID#1_Ha#1: DEV 0000:ff:12.4
[3564461.862138] BUG: unable to handle kernel 
[3564480.573256] NMI watchdog: Watchdog detected hard LOCKUP on cpu 19
[3564480.573256] Modules linked in:
[3564480.573258]  iptable_filter
[3564480.573259]  mpt2sas
**** Lines Trimmed ***
[3564480.573314]  dm_mod
[3564480.573314]  [last unloaded: sb_edac]
[3564480.573315] 
[3564480.573318] CPU: 19 PID: 0 Comm: swapper/19 Kdump: loaded Not tainted 3.10.0-1160.45.1.el7.x86_64 #1
[3564480.573319] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.11.0 11/02/2019
[3564480.573321] task: ffff91bb1f779080 ti: ffff91bb1f790000 task.ti: ffff91bb1f790000
[3564480.573322] RIP: 0010:[<ffffffff9f117b50>] 
[3564480.573328]  [<ffffffff9f117b50>] native_queued_spin_lock_slowpath+0x1d0/0x200
[3564480.573329] RSP: 0018:ffff91bb1f7938e0  EFLAGS: 00000002
[3564480.573330] RAX: 0000000002390101 RBX: 0000000000000082 RCX: 0000000000000001
[3564480.573331] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffff91f97ee53940
[3564480.573332] RBP: ffff91bb1f7938e0 R08: 0000000000000101 R09: 0000000000000000
[3564480.573333] R10: 0000000000000000 R11: ffff91bb1f793966 R12: ffff91f97ee53940
[3564480.573334] R13: ffff91f9722e9cf0 R14: ffff91bb1f793950 R15: 0000000000000e20
[3564480.573336] FS:  0000000000000000(0000) GS:ffff91f97ee40000(0000) knlGS:0000000000000000
[3564480.573337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3564480.573338] CR2: 0000000000000018 CR3: 0000005d31dda000 CR4: 00000000003607e0
[3564480.573339] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3564480.573340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3564480.573341] Call Trace:
[3564480.573348]  [<ffffffff9f77dcf3>] queued_spin_lock_slowpath+0xb/0xf
[3564480.573353]  [<ffffffff9f78bb27>] _raw_spin_lock_irqsave+0x37/0x40
[3564480.573357]  [<ffffffff9f0adbcb>] lock_timer_base.isra.38+0x2b/0x50
[3564480.573360]  [<ffffffff9f0aed04>] mod_timer+0x84/0x230
[3564480.573362]  [<ffffffff9f0aeec8>] add_timer+0x18/0x20
[3564480.573366]  [<ffffffff9f404d79>] fbcon_add_cursor_timer+0x99/0xf0
[3564480.573368]  [<ffffffff9f40815a>] fbcon_cursor+0xaa/0x1c0
[3564480.573372]  [<ffffffff9f393f34>] ? vsnprintf+0x234/0x6a0
[3564480.573375]  [<ffffffff9f47eaa3>] hide_cursor+0x33/0xa0
[3564480.573378]  [<ffffffff9f480528>] vt_console_print+0x3e8/0x430
[3564480.573380]  [<ffffffff9f394526>] ? sprintf+0x56/0x80
[3564480.573383]  [<ffffffff9f09c513>] call_console_drivers.constprop.19+0x93/0xf0
[3564480.573385]  [<ffffffff9f09dbeb>] console_unlock+0x33b/0x4b0
[3564480.573388]  [<ffffffff9f0cbdad>] ? down_trylock+0x2d/0x40
[3564480.573390]  [<ffffffff9f09e124>] vprintk_emit+0x3c4/0x510
[3564480.573392]  [<ffffffff9f09e4d9>] vprintk_default+0x29/0x40
[3564480.573394]  [<ffffffff9f77d3d8>] printk+0x60/0x77
[3564480.573399]  [<ffffffff9f075de2>] no_context+0x212/0x300
[3564480.573401]  [<ffffffff9f075fe2>] __bad_area_nosemaphore+0x112/0x220
[3564480.573403]  [<ffffffff9f076104>] bad_area_nosemaphore+0x14/0x20
[3564480.573407]  [<ffffffff9f790750>] __do_page_fault+0x310/0x500
[3564480.573409]  [<ffffffff9f790975>] do_page_fault+0x35/0x90
[3564480.573411]  [<ffffffff9f78c778>] page_fault+0x28/0x30
[3564480.573414]  [<ffffffff9f0af088>] ? get_next_timer_interrupt+0x1b8/0x260
[3564480.573416]  [<ffffffff9f0aef26>] ? get_next_timer_interrupt+0x56/0x260
[3564480.573420]  [<ffffffff9f110a07>] tick_nohz_stop_sched_tick+0x1f7/0x390
[3564480.573422]  [<ffffffff9f110c3f>] __tick_nohz_idle_enter+0x9f/0x170
[3564480.573424]  [<ffffffff9f11118f>] tick_nohz_idle_enter+0x3f/0x70
[3564480.573428]  [<ffffffff9f101777>] cpu_startup_entry+0xa7/0x1e0
[3564480.573433]  [<ffffffff9f05a827>] start_secondary+0x1f7/0x270
[3564480.573436]  [<ffffffff9f0000d5>] start_cpu+0x5/0x14
[3564480.573437] Code: 
[3564480.573438] fe 
**** Lines Trimmed ***
[3564480.573465] 0f 
[3564480.573465] 1f 
[3564480.573465] 
[3564480.573466] Kernel panic - not syncing: Hard LOCKUP

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 6
  • Cisco UCS Series
  • edac module.

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content