A server crashes due to a hard lockup happened on a CPU on which a task is calling run_rebalance_domains() and getting stuck waiting on rq.lock spinlock of another CPU
Issue
- The server crashed due to the hard lockup that happened on CPU 0:
[1781582.852138] Kernel panic - not syncing: Hard LOCKUP
[1781582.852842] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: P OE ------------ T 3.10.0-1160.88.1.el7.x86_64 #1
[1781582.854240] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 08/04/2022
[1781582.854936] Call Trace:
[1781582.855611] <NMI> [<ffffffffba9b1bec>] dump_stack+0x19/0x1f
[1781582.856287] [<ffffffffba9ab708>] panic+0xe8/0x21f
[1781582.856948] [<ffffffffba230a78>] ? show_regs+0x58/0x290
[1781582.857598] [<ffffffffba29f523>] nmi_panic+0x43/0x50
[1781582.858230] [<ffffffffba357409>] watchdog_overflow_callback+0x119/0x140
[1781582.858858] [<ffffffffba3b32a7>] __perf_event_overflow+0x57/0x100
[1781582.859480] [<ffffffffba3bcd64>] perf_event_overflow+0x14/0x20
[1781582.860102] [<ffffffffba20acf0>] handle_pmi_common+0x1a0/0x260
[1781582.860727] [<ffffffffba59eb48>] ? ioremap_page_range+0x2e8/0x490
[1781582.925580] [<ffffffffba411b54>] ? vunmap_page_range+0x234/0x470
[1781582.926208] [<ffffffffba66af66>] ? ghes_copy_tofrom_phys+0x116/0x220
[1781582.926828] [<ffffffffba20afef>] intel_pmu_handle_irq+0xcf/0x1d0
[1781582.927441] [<ffffffffba9bb039>] perf_event_nmi_handler+0x39/0x60
[1781582.928050] [<ffffffffba9bc9cc>] nmi_handle.isra.0+0x8c/0x150
[1781582.928660] [<ffffffffba9bcca8>] do_nmi+0x218/0x460
[1781582.929276] [<ffffffffba9bbdf4>] end_repeat_nmi+0x1e/0x81
[1781582.929884] [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.930504] [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.931109] [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.931705] <EOE> <IRQ> [<ffffffffba9ac21a>] queued_spin_lock_slowpath+0xb/0x13
[1781582.932312] [<ffffffffba9ba60c>] _raw_spin_lock_irq+0x2c/0x40
[1781582.932921] [<ffffffffba2f0771>] run_rebalance_domains+0x171/0x1e0
[1781582.933534] [<ffffffffba2a9595>] __do_softirq+0xf5/0x290
[1781582.934147] [<ffffffffba9c8aac>] call_softirq+0x1c/0x30
[1781582.934759] [<ffffffffba230825>] do_softirq+0x65/0xa0
[1781582.935369] [<ffffffffba2a9945>] irq_exit+0x115/0x120
[1781582.935975] [<ffffffffba2de045>] scheduler_ipi+0x75/0x1a0
[1781582.936580] [<ffffffffba25c2df>] smp_reschedule_interrupt+0x2f/0x40
[1781582.937180] [<ffffffffba9c7d72>] reschedule_interrupt+0x172/0x180
[1781582.937778] <EOI> [<ffffffffba7ebe97>] ? cpuidle_enter_state+0x57/0xd0
[1781582.938399] [<ffffffffba7ebe8d>] ? cpuidle_enter_state+0x4d/0xd0
[1781582.939017] [<ffffffffba7ebfee>] cpuidle_idle_call+0xde/0x230
[1781582.939636] [<ffffffffba23955e>] arch_cpu_idle+0xe/0xc0
[1781582.940252] [<ffffffffba30820a>] cpu_startup_entry+0x14a/0x1e0
[1781582.940873] [<ffffffffba9a03f7>] rest_init+0x77/0x80
[1781582.941492] [<ffffffffbaf8c22b>] start_kernel+0x44b/0x470
[1781582.942108] [<ffffffffbaf8bbc8>] ? repair_env_string+0x64/0x64
[1781582.942725] [<ffffffffbaf8b120>] ? early_idt_handler_array+0x120/0x120
[1781582.943342] [<ffffffffbaf8b748>] x86_64_start_reservations+0x24/0x2a
[1781582.943964] [<ffffffffbaf8b8a2>] x86_64_start_kernel+0x154/0x17b
[1781582.944584] [<ffffffffba2000d5>] start_cpu+0x5/0x14
Environment
- Red Hat Enterprise Linux 7.9.z - kernel-3.10.0-1160.88.1.el7
- HPE ProLiant
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.