Multiple Supermicro servers got crashed due to hard LOCKUPs every night.
Issue
- Multiple Supermicro servers got crashed due to hard LOCKUPs every night.
[270762.593673] Kernel panic - not syncing: Hard LOCKUP
[270762.599924] CPU: 29 PID: 14467 Comm: etcd Kdump: loaded Tainted: P OE ------------ 3.10.0-957.5.1.el7.x86_64 #1
[270762.614814] Hardware name: Supermicro PIO-648R-E1CR36L+-ST031/X10DRi-T4+, BIOS 3.1 06/08/2018
[270762.627708] Call Trace:
[270762.636000] [] dump_stack+0x19/0x1b
[270762.644319] [] panic+0xe8/0x21f
[270762.650610] [] ? page_fault+0x28/0x30
[270762.657047] [] nmi_panic+0x3f/0x40
[270762.665622] [] watchdog_overflow_callback+0x121/0x140
[270762.674092] [] __perf_event_overflow+0x57/0x100
[270762.682718] [] perf_event_overflow+0x14/0x20
[270762.689076] [] intel_pmu_handle_irq+0x220/0x510
[270762.695503] [] ? ioremap_page_range+0x2e8/0x480
[270762.703863] [] ? vunmap_page_range+0x234/0x470
[270762.712320] [] ? ghes_copy_tofrom_phys+0x116/0x210
[270762.718675] [] ? ghes_read_estatus+0xa0/0x190
[270762.725113] [] perf_event_nmi_handler+0x31/0x50
[270762.733581] [] nmi_handle.isra.0+0x8c/0x150
[270762.741970] [] do_nmi+0x15d/0x460
[270762.748271] [] end_repeat_nmi+0x1e/0x81
[270762.754671] [] ? do_numa_page+0x148/0x250
[270762.763196] [] ? do_numa_page+0x148/0x250
[270762.771593] [] ? do_numa_page+0x148/0x250
[270762.777883] [] handle_pte_fault+0x316/0xd10
[270762.784285] [] handle_mm_fault+0x39d/0x9b0
[270762.792700] [] __do_page_fault+0x203/0x500
[270762.801113] [] do_page_fault+0x35/0x90
[270762.982926] [] page_fault+0x28/0x30
Environment
- Red Hat Enterprise Linux 7.6 (kernel-3.10.0-957.5.1.el7)
- Supermicro PIO-648R-E1CR36L+-ST031
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.