The server crashed with multiple hard lockups. Lots of MCE events were being logged before the crash.
Issue
- The server crashed with multiple hard lockups. Lots of MCE events were being logged before the crash.
[1780915.816854] mce: [Hardware Error]: Machine check events logged
[2176919.599198] mce: [Hardware Error]: Machine check events logged
[2829531.332234] mce: [Hardware Error]: Machine check events logged
[2919937.665378] mce: [Hardware Error]: Machine check events logged
[3038978.496527] mce: [Hardware Error]: Machine check events logged
[3323876.767276] mce: [Hardware Error]: Machine check events logged
[3647175.065549] mce: [Hardware Error]: Machine check events logged
[3731430.668298] mce: [Hardware Error]: Machine check events logged
[3886067.562527] mce: [Hardware Error]: Machine check events logged
[4473983.637762] mce: [Hardware Error]: Machine check events logged
[4567374.662809] mce: [Hardware Error]: Machine check events logged
[4923388.114969] mce: [Hardware Error]: Machine check events logged
[5000803.805808] mce: [Hardware Error]: Machine check events logged
[5518009.178445] mce: [Hardware Error]: Machine check events logged
[5856232.630047] mce: [Hardware Error]: Machine check events logged
[7221486.591468] mce: [Hardware Error]: Machine check events logged
[7221486.606332] mce: [Hardware Error]: Machine check events logged
[7825137.868388] mce: [Hardware Error]: Machine check events logged
[7917809.176188] mce: [Hardware Error]: Machine check events logged
[8157046.692590] mce: [Hardware Error]: Machine check events logged
[8611445.971314] mce: [Hardware Error]: Machine check events logged
[8752957.459324] mce: [Hardware Error]: Machine check events logged
[9135446.374342] mce: [Hardware Error]: Machine check events logged
[9723422.396207] mce: [Hardware Error]: Machine check events logged
[9798483.563618] mce: [Hardware Error]: Machine check events logged
[10162160.576292] mce: [Hardware Error]: Machine check events logged
[10162251.862879] mce: [Hardware Error]: Machine check events logged
[10162251.964029] mce: [Hardware Error]: Machine check events logged
[10162260.577124] CMCI storm detected: switching to poll mode
[10162260.734937] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.066 msecs
[10162260.736870] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.903 msecs
[10162260.738734] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 156.988 msecs
[10162260.738749] hrtimer: interrupt took 2800405 ns
[10162260.738752] perf: interrupt took too long (22821 > 16611), lowering kernel.perf_event_max_sample_rate to 8000
[10162262.720130] perf: interrupt took too long (40302 > 28526), lowering kernel.perf_event_max_sample_rate to 4000
[10162268.024932] sched: RT throttling activated
[10162277.031346] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1000.682 msecs
[10162278.032120] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 2001.366 msecs
[10162278.032128] perf: interrupt took too long (15651287 > 9794597), lowering kernel.perf_event_max_sample_rate to 1000
[10162278.032133] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 7005.029 msecs
[10162278.032136] perf: interrupt took too long (54735333 > 19564108), lowering kernel.perf_event_max_sample_rate to 1000
[10162287.038868] NMI watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [titanagent:2336749]
[10162288.039250] Modules linked in: target_core_pscsi target_core_file target_core_iblock binfmt_misc tcp_diag inet_diag target_core_user uio rpcrdma ib_isert sunrpc iscsi_target_mod
[10162288.039252] NMI watchdog: Watchdog detected hard LOCKUP on cpu 32
[10162288.039318] Modules linked in: target_core_pscsi target_core_file target_core_iblock binfmt_misc tcp_diag inet_diag target_core_user uio rpcrdma ib_isert sunrpc iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm vfat fat intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ocrdma(T) cdc_ether iTCO_wdt ib_core gpio_ich iTCO_vendor_support pcspkr cryptd usbnet lpc_ich mii sg ioatdma i2c_i801 i7core_edac ipmi_si ipmi_devintf pcc_cpufreq ipmi_msghandler acpi_cpufreq ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea ata_piix sysfillrect
[10162288.039336] sysimgblt igb fb_sys_fops ptp ttm crct10dif_pclmul drm crct10dif_common pps_core crc32c_intel dca libata serio_raw megaraid_sas be2net bnx2 drm_panel_orientation_quirks i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
[10162288.039341] CPU: 32 PID: 4476 Comm: tp_osd_tp Kdump: loaded Tainted: G I ------------ T 3.10.0-1062.21.1.el7.x86_64 #1
[10162288.039342] Hardware name: IBM System x3850 X5 -[7143SPM]-/Node 1, Processor Card, BIOS -[G0E173BUS-1.73]- 01/21/2012
[10162288.039344] task: ffff88b5dc66c1c0 ti: ffff88b5dbb68000 task.ti: ffff88b5dbb68000
[10162288.039352] RIP: 0010:[<ffffffff938d51b4>] [<ffffffff938d51b4>] finish_task_switch+0x54/0x1c0
[10162288.039353] RSP: 0018:ffff88b5dbb6bc08 EFLAGS: 00000282
[10162288.039355] RAX: ffff88b5d6805230 RBX: ffff88b5dbb6bbb8 RCX: 00000000c0000100
[10162288.039356] RDX: ffff88b5dbb69fd8 RSI: ffff88b5dc66c1c0 RDI: ffff88a5dfc1ac80
[10162288.039358] RBP: ffff88b5dbb6bc28 R08: ffff88b5dbb68000 R09: 0000000000000001
[10162288.039359] R10: 0000000000000001 R11: ffff88bddd638a00 R12: ffff88b5dbb6bbb8
[10162288.039360] R13: ffffffff938e507c R14: ffff88b5dbb6bbb0 R15: 0000000000000001
[10162288.039363] FS: 00007f5a92131700(0000) GS:ffff88a5dfc00000(0000) knlGS:0000000000000000
[10162288.039364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10162288.039366] CR2: 000055e5485cd698 CR3: 000000205d9f8000 CR4: 00000000000207e0
[10162288.039367] Call Trace:
[10162288.039373] [<ffffffff93f80d92>] __schedule+0x402/0x840
[10162288.039376] [<ffffffff93f811f9>] schedule+0x29/0x70
[10162288.039384] [<ffffffff93912306>] futex_wait_queue_me+0xc6/0x130
[10162288.039387] [<ffffffff939130ab>] futex_wait+0x17b/0x280
[10162288.039390] [<ffffffff938d7959>] ? ttwu_do_wakeup+0x19/0xe0
[10162288.039396] [<ffffffff938c9fe0>] ? hrtimer_get_res+0x50/0x50
[10162288.039399] [<ffffffff939122e4>] ? futex_wait_queue_me+0xa4/0x130
[10162288.039402] [<ffffffff93914df6>] do_futex+0x106/0x5a0
[10162288.039406] [<ffffffff93915310>] SyS_futex+0x80/0x190
[10162288.039411] [<ffffffff93f8dede>] system_call_fastpath+0x25/0x2a
[10162288.039414] [<ffffffff93f8de21>] ? system_call_after_swapgs+0xae/0x146
[10162288.039440] Code: 8b 36 66 66 66 66 90 65 48 8b 34 25 80 0e 01 00 66 66 66 66 90 41 c7 45 28 00 00 00 00 48 89 df c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 65 48 8b 04 25 80 0e 01 00 48 8b 98 78 01 00 00 48 85
[10162288.039442] Kernel panic - not syncing: Hard LOCKUP
[10162288.039445] CPU: 32 PID: 4476 Comm: tp_osd_tp Kdump: loaded Tainted: G I ------------ T 3.10.0-1062.21.1.el7.x86_64 #1
[10162288.039446] Hardware name: IBM System x3850 X5 -[7143SPM]-/Node 1, Processor Card, BIOS -[G0E173BUS-1.73]- 01/21/2012
[10162288.039446] Call Trace:
[10162288.039454] <NMI> [<ffffffff93f7b416>] dump_stack+0x19/0x1b
[10162288.039458] [<ffffffff93f74a0b>] panic+0xe8/0x21f
[10162288.039463] [<ffffffff9382f958>] ? show_regs+0x58/0x290
[10162288.039470] [<ffffffff9389b88f>] nmi_panic+0x3f/0x40
[10162288.039475] [<ffffffff9394eda1>] watchdog_overflow_callback+0x121/0x140
[10162288.039480] [<ffffffff939a8497>] __perf_event_overflow+0x57/0x100
[10162288.039485] [<ffffffff939b1c34>] perf_event_overflow+0x14/0x20
[10162288.039489] [<ffffffff9380ac70>] handle_pmi_common+0x1a0/0x250
[10162288.039493] [<ffffffff9380af4f>] intel_pmu_handle_irq+0xcf/0x1d0
[10162288.039496] [<ffffffff93f84031>] perf_event_nmi_handler+0x31/0x50
[10162288.039499] [<ffffffff93f8593c>] nmi_handle.isra.0+0x8c/0x150
[10162288.039501] [<ffffffff93f841bb>] ? save_paranoid+0xfb/0x140
[10162288.039504] [<ffffffff93f85c18>] do_nmi+0x218/0x460
[10162288.039506] [<ffffffff93f841af>] ? save_paranoid+0xef/0x140
[10162288.039509] [<ffffffff93f84d9c>] end_repeat_nmi+0x1e/0x81
[10162288.039512] [<ffffffff939176c6>] ? native_queued_spin_lock_slowpath+0x126/0x200
[10162288.039514] [<ffffffff939176c6>] ? native_queued_spin_lock_slowpath+0x126/0x200
[10162288.039517] [<ffffffff939176c6>] ? native_queued_spin_lock_slowpath+0x126/0x200
[10162288.039521] <EOE> <IRQ> [<ffffffff93f754ee>] queued_spin_lock_slowpath+0xb/0xf
[10162288.039524] [<ffffffff93f83ba7>] _raw_spin_lock_irqsave+0x37/0x40
[10162288.039529] [<ffffffff939c7dcd>] get_page_from_freelist+0x69d/0xaa0
[10162288.039533] [<ffffffff93ebb58f>] ? tcp_send_ack+0x11f/0x170
[10162288.039536] [<ffffffff939c8336>] __alloc_pages_nodemask+0x166/0x450
[10162288.039541] [<ffffffff93a16c28>] alloc_pages_current+0x98/0x110
[10162288.039546] [<ffffffff93a24f03>] new_slab+0x393/0x4e0
[10162288.039548] [<ffffffff93a2541c>] ___slab_alloc+0x3cc/0x520
[10162288.039561] [<ffffffffc03ecf0d>] ? bnx2_poll_work+0x9ad/0x1350 [bnx2]
[10162288.039564] [<ffffffff93e4fbd9>] ? __netif_receive_skb_core+0x729/0xa10
[10162288.039571] [<ffffffffc03ecf0d>] ? bnx2_poll_work+0x9ad/0x1350 [bnx2]
[10162288.039573] [<ffffffff93f77dbc>] __slab_alloc+0x40/0x5c
[10162288.039575] [<ffffffff93a26120>] __kmalloc+0x1c0/0x230
[10162288.039581] [<ffffffffc03ecf0d>] bnx2_poll_work+0x9ad/0x1350 [bnx2]
[10162288.039588] [<ffffffffc03ed8ea>] bnx2_poll_msix+0x3a/0xc0 [bnx2]
[10162288.039591] [<ffffffff93e5057f>] net_rx_action+0x26f/0x390
[10162288.039597] [<ffffffff938a54b5>] __do_softirq+0xf5/0x280
[10162288.039601] [<ffffffff93f9142c>] call_softirq+0x1c/0x30
[10162288.039604] [<ffffffff9382f715>] do_softirq+0x65/0xa0
[10162288.039606] [<ffffffff938a5835>] irq_exit+0x105/0x110
[10162289.039990] [<ffffffff93f929d8>] smp_apic_timer_interrupt+0x48/0x60
[10162289.039994] [<ffffffff93f8eefa>] apic_timer_interrupt+0x16a/0x170
[10162289.039995] NMI watchdog: Watchdog detected hard LOCKUP on cpu 62
[10162289.039998] <EOI> [<ffffffff938d51b4>] ? finish_task_switch+0x54/0x1c0
[10162289.040001] [<ffffffff93f80d92>] __schedule+0x402/0x840
[10162289.040003] [<ffffffff93f811f9>] schedule+0x29/0x70
[10162289.040006] [<ffffffff93912306>] futex_wait_queue_me+0xc6/0x130
[10162289.040009] [<ffffffff939130ab>] futex_wait+0x17b/0x280
[10162289.040011] [<ffffffff938d7959>] ? ttwu_do_wakeup+0x19/0xe0
[10162289.040014] [<ffffffff938c9fe0>] ? hrtimer_get_res+0x50/0x50
[10162289.040017] [<ffffffff939122e4>] ? futex_wait_queue_me+0xa4/0x130
[10162289.040020] [<ffffffff93914df6>] do_futex+0x106/0x5a0
[10162289.040023] [<ffffffff93915310>] SyS_futex+0x80/0x190
[10162289.040026] [<ffffffff93f8dede>] system_call_fastpath+0x25/0x2a
[10162289.040029] [<ffffffff93f8de21>] ? system_call_after_swapgs+0xae/0x146
[10162290.041036] Modules linked in: target_core_pscsi target_core_file target_core_iblock binfmt_misc tcp_diag inet_diag target_core_user uio rpcrdma ib_isert sunrpc iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm vfat fat intel_powerclamp coretemp kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper ocrdma(T) cdc_ether iTCO_wdt ib_core gpio_ich iTCO_vendor_support pcspkr cryptd usbnet lpc_ich mii sg ioatdma i2c_i801 i7core_edac ipmi_si ipmi_devintf pcc_cpufreq ipmi_msghandler acpi_cpufreq ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea ata_piix sysfillrect
[10162290.041050] sysimgblt igb fb_sys_fops ptp ttm crct10dif_pclmul drm crct10dif_common pps_core crc32c_intel dca libata serio_raw megaraid_sas be2net bnx2 drm_panel_orientation_quirks i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod
[10162290.041054] CPU: 62 PID: 4193 Comm: msgr-worker-0 Kdump: loaded Tainted: G I ------------ T 3.10.0-1062.21.1.el7.x86_64 #1
[10162290.041055] Hardware name: IBM System x3850 X5 -[7143SPM]-/Node 1, Processor Card, BIOS -[G0E173BUS-1.73]- 01/21/2012
[10162290.041057] task: ffff88adcf270000 ti: ffff88adca894000 task.ti: ffff88adca894000
[10162290.041064] RIP: 0010:[<ffffffff938d51b4>] [<ffffffff938d51b4>] finish_task_switch+0x54/0x1c0
[10162290.041065] RSP: 0018:ffff88adca897a98 EFLAGS: 00000282
[10162290.041066] RAX: ffff88adbd8341c0 RBX: ffff88adcf270068 RCX: 00000000c0000100
[10162290.041068] RDX: ffff88adca895fd8 RSI: ffff88adcf270000 RDI: ffff88bddf59ac80
[10162290.041069] RBP: ffff88adca897ab8 R08: ffff88adca894000 R09: 0000000000000001
[10162290.041070] R10: 0000000000000001 R11: ffff88bddb78ba00 R12: 0000000000000292
[10162290.041071] R13: ffff88adca897a48 R14: ffff88adca897a00 R15: ffff88adcf270068
[10162290.041073] FS: 00007fdbe5d7e700(0000) GS:ffff88bddf580000(0000) knlGS:0000000000000000
[10162290.041074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10162290.041076] CR2: 00007f8401bee528 CR3: 000000105dd34000 CR4: 00000000000207e0
[10162290.041077] Call Trace:
[10162290.041082] [<ffffffff93f80d92>] __schedule+0x402/0x840
[10162290.041090] [<ffffffff93a3afd9>] ? __mem_cgroup_uncharge_common+0x49/0x2f0
[10162290.041093] [<ffffffff938d7c36>] __cond_resched+0x26/0x30
[10162290.041094] [<ffffffff93f8149a>] _cond_resched+0x3a/0x50
[10162290.041096] [<ffffffff93f804f2>] down_read+0x12/0x40
[10162290.041100] [<ffffffff93a00bd0>] rmap_walk+0x300/0x360
[10162290.041105] [<ffffffff93a2c1a8>] ? migrate_page_copy+0x128/0x380
[10162290.041107] [<ffffffff93a2a466>] remove_migration_ptes+0x46/0x70
[10162290.041110] [<ffffffff93a2b280>] ? migrate_vma_collect_pmd+0x610/0x610
[10162290.041112] [<ffffffff93a2c5bb>] move_to_new_page+0x13b/0x250
[10162290.041114] [<ffffffff93a00e9b>] ? try_to_unmap+0x8b/0xe0
[10162290.041116] [<ffffffff939ff130>] ? page_remove_rmap+0x160/0x160
[10162290.041118] [<ffffffff93a2defd>] migrate_pages+0x6dd/0x7f0
[10162290.041121] [<ffffffff93a2a830>] ? migrate_vma_collect.constprop.52+0xe0/0xe0
[10162290.041123] [<ffffffff93a2e97c>] migrate_misplaced_page+0xcc/0x100
[10162290.041128] [<ffffffff939f12ed>] do_numa_page+0x19d/0x250
[10162290.041130] [<ffffffff939f3c5b>] handle_mm_fault+0xadb/0xfb0
[10162290.041136] [<ffffffff93e2e731>] ? sock_aio_read+0x21/0x30
[10162290.041144] [<ffffffff93a49e93>] ? do_sync_read+0x93/0xe0
[10162290.041147] [<ffffffff93f88653>] __do_page_fault+0x213/0x500
[10162290.041150] [<ffffffff93f88975>] do_page_fault+0x35/0x90
[10162290.041152] [<ffffffff93f84ac9>] ? error_swapgs+0xaa/0xc0
[10162290.041154] [<ffffffff93f84778>] page_fault+0x28/0x30
[10162290.041175] Code: 8b 36 66 66 66 66 90 65 48 8b 34 25 80 0e 01 00 66 66 66 66 90 41 c7 45 28 00 00 00 00 48 89 df c6 07 00 66 66 66 90 fb 66 66 90 <66> 66 90 65 48 8b 04 25 80 0e 01 00 48 8b 98 78 01 00 00 48 85
Environment
- Red Hat Enterprise Linux 7.7 (kernel-3.10.0-1062.21.1.el7)
- IBM System x3850 X5
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.