A server crashes due to a hard lockup happened on a CPU on which a task is calling run_rebalance_domains() and getting stuck waiting on rq.lock spinlock of another CPU

Solution Unverified - Updated -

Issue

  • The server crashed due to the hard lockup that happened on CPU 0:
[1781582.852138] Kernel panic - not syncing: Hard LOCKUP
[1781582.852842] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-1160.88.1.el7.x86_64 #1
[1781582.854240] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 08/04/2022
[1781582.854936] Call Trace:
[1781582.855611]  <NMI>  [<ffffffffba9b1bec>] dump_stack+0x19/0x1f
[1781582.856287]  [<ffffffffba9ab708>] panic+0xe8/0x21f
[1781582.856948]  [<ffffffffba230a78>] ? show_regs+0x58/0x290
[1781582.857598]  [<ffffffffba29f523>] nmi_panic+0x43/0x50
[1781582.858230]  [<ffffffffba357409>] watchdog_overflow_callback+0x119/0x140
[1781582.858858]  [<ffffffffba3b32a7>] __perf_event_overflow+0x57/0x100
[1781582.859480]  [<ffffffffba3bcd64>] perf_event_overflow+0x14/0x20
[1781582.860102]  [<ffffffffba20acf0>] handle_pmi_common+0x1a0/0x260
[1781582.860727]  [<ffffffffba59eb48>] ? ioremap_page_range+0x2e8/0x490
[1781582.925580]  [<ffffffffba411b54>] ? vunmap_page_range+0x234/0x470
[1781582.926208]  [<ffffffffba66af66>] ? ghes_copy_tofrom_phys+0x116/0x220
[1781582.926828]  [<ffffffffba20afef>] intel_pmu_handle_irq+0xcf/0x1d0
[1781582.927441]  [<ffffffffba9bb039>] perf_event_nmi_handler+0x39/0x60
[1781582.928050]  [<ffffffffba9bc9cc>] nmi_handle.isra.0+0x8c/0x150
[1781582.928660]  [<ffffffffba9bcca8>] do_nmi+0x218/0x460
[1781582.929276]  [<ffffffffba9bbdf4>] end_repeat_nmi+0x1e/0x81
[1781582.929884]  [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.930504]  [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.931109]  [<ffffffffba31ec86>] ? native_queued_spin_lock_slowpath+0x1d6/0x200
[1781582.931705]  <EOE>  <IRQ>  [<ffffffffba9ac21a>] queued_spin_lock_slowpath+0xb/0x13
[1781582.932312]  [<ffffffffba9ba60c>] _raw_spin_lock_irq+0x2c/0x40
[1781582.932921]  [<ffffffffba2f0771>] run_rebalance_domains+0x171/0x1e0
[1781582.933534]  [<ffffffffba2a9595>] __do_softirq+0xf5/0x290
[1781582.934147]  [<ffffffffba9c8aac>] call_softirq+0x1c/0x30
[1781582.934759]  [<ffffffffba230825>] do_softirq+0x65/0xa0
[1781582.935369]  [<ffffffffba2a9945>] irq_exit+0x115/0x120
[1781582.935975]  [<ffffffffba2de045>] scheduler_ipi+0x75/0x1a0
[1781582.936580]  [<ffffffffba25c2df>] smp_reschedule_interrupt+0x2f/0x40
[1781582.937180]  [<ffffffffba9c7d72>] reschedule_interrupt+0x172/0x180
[1781582.937778]  <EOI>  [<ffffffffba7ebe97>] ? cpuidle_enter_state+0x57/0xd0
[1781582.938399]  [<ffffffffba7ebe8d>] ? cpuidle_enter_state+0x4d/0xd0
[1781582.939017]  [<ffffffffba7ebfee>] cpuidle_idle_call+0xde/0x230
[1781582.939636]  [<ffffffffba23955e>] arch_cpu_idle+0xe/0xc0
[1781582.940252]  [<ffffffffba30820a>] cpu_startup_entry+0x14a/0x1e0
[1781582.940873]  [<ffffffffba9a03f7>] rest_init+0x77/0x80
[1781582.941492]  [<ffffffffbaf8c22b>] start_kernel+0x44b/0x470
[1781582.942108]  [<ffffffffbaf8bbc8>] ? repair_env_string+0x64/0x64
[1781582.942725]  [<ffffffffbaf8b120>] ? early_idt_handler_array+0x120/0x120
[1781582.943342]  [<ffffffffbaf8b748>] x86_64_start_reservations+0x24/0x2a
[1781582.943964]  [<ffffffffbaf8b8a2>] x86_64_start_kernel+0x154/0x17b
[1781582.944584]  [<ffffffffba2000d5>] start_cpu+0x5/0x14

Environment

  • Red Hat Enterprise Linux 7.9.z - kernel-3.10.0-1160.88.1.el7
  • HPE ProLiant

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content