Deadlock scenario on CPU runqueue lock between update_blocked_averages() and try_to_wake_up().
Issue
-
Kernel crashes with "hard LOCKUP" while the panicking CPU is waiting for a runqueue spinlock or workqueue spinlock. (See few example backtraces in Diagnostic Steps.)
-
The core of the issue is a CPU in deadlock scenario for it's own runqueue lock between update_blocked_averages() and try_to_wake_up():
PID: 2345 TASK: ffff8b1dfd825000 CPU: 11 COMMAND: "CPU 2/KVM"
#0 [fffffe000023ae48] crash_nmi_callback at ffffffff9f857853
#1 [fffffe000023ae50] nmi_handle at ffffffff9f826e03
#2 [fffffe000023aea8] default_do_nmi at ffffffffa01936b9
#3 [fffffe000023aec8] do_nmi at ffffffff9f82733f
#4 [fffffe000023aef0] end_repeat_nmi at ffffffffa02015a4
[exception RIP: native_queued_spin_lock_slowpath+0x5b]
RIP: ffffffff9f94e30b RSP: ffffa070cc838a78 RFLAGS: 00000002
RAX: 00000000001c0101 RBX: ffff8b3c89905000 RCX: 000000000000000b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b5bffb6ae40
RBP: ffff8b5bffb6ae40 R8: ffff8b5bffb6a760 R9: ffff8b1d00403930
R10: 0000000000000000 R11: ffffffffa105b548 R12: 0000000000000000
R13: ffff8b3c89905bbc R14: 0000000000000087 R15: 000000000000000b
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffa070cc838a78] native_queued_spin_lock_slowpath at ffffffff9f94e30b
#6 [ffffa070cc838a78] _raw_spin_lock at ffffffffa01a667a
#7 [ffffa070cc838a80] try_to_wake_up at ffffffff9f91fcbd
#8 [ffffa070cc838ae0] __queue_work at ffffffff9f90aa5d
#9 [ffffa070cc838b28] queue_work_on at ffffffff9f90ad34
#10 [ffffa070cc838b38] soft_cursor at ffffffff9fd35834
#11 [ffffa070cc838b90] bit_cursor at ffffffff9fd35462
#12 [ffffa070cc838c58] hide_cursor at ffffffff9fdcd22a
#13 [ffffa070cc838c68] vt_console_print at ffffffff9fdcf99d
#14 [ffffa070cc838cd0] console_unlock at ffffffff9f959e1f
#15 [ffffa070cc838d90] vprintk_emit at ffffffff9f95bf1d
#16 [ffffa070cc838de0] printk at ffffffff9f95c504
#17 [ffffa070cc838e40] __warn_printk at ffffffff9f8ed73f
#18 [ffffa070cc838ea8] update_blocked_averages at ffffffff9f92d1cf
#19 [ffffa070cc838f18] run_rebalance_domains at ffffffff9f933e01
#20 [ffffa070cc838f70] __softirqentry_text_start at ffffffffa04000d7
#21 [ffffa070cc838fc0] irq_exit_rcu at ffffffff9f8f3f3b
#22 [ffffa070cc838fd0] irq_exit at ffffffff9f8f3f4a
#23 [ffffa070cc838fd8] smp_apic_timer_interrupt at ffffffffa02026c4
#24 [ffffa070cc838ff0] apic_timer_interrupt at ffffffffa0201c4f
--- <IRQ stack> ---
#25 [ffffa070cf8e3c38] apic_timer_interrupt at ffffffffa0201c4f
[exception RIP: vmx_do_interrupt_nmi_irqoff+0x26]
RIP: ffffffffc1b155d6 RSP: ffffa070cf8e3ce0 RFLAGS: 00000082
RAX: 0000000000001c40 RBX: ffff8b1d08b5aa80 RCX: 0000e5f36127e9a0
RDX: ffffffff00000000 RSI: 00000000800000ec RDI: ffffffffa0201c40
RBP: ffffa070cf8e3ce0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: ffff8b209a6c0200
ORIG_RAX: ffffffffffffff13 CS: 0010 SS: 0018
#26 [ffffa070cf8e3ce8] vmx_handle_exit_irqoff at ffffffffc1b053a7 [kvm_intel]
#27 [ffffa070cf8e3cf8] vcpu_enter_guest at ffffffffc1024ea6 [kvm]
#28 [ffffa070cf8e3da0] kvm_arch_vcpu_ioctl_run at ffffffffc102840f [kvm]
#29 [ffffa070cf8e3dd0] kvm_vcpu_ioctl at ffffffffc1001738 [kvm]
#30 [ffffa070cf8e3e80] do_vfs_ioctl at ffffffff9fb54694
#31 [ffffa070cf8e3ef8] ksys_ioctl at ffffffff9fb54cd0
#32 [ffffa070cf8e3f30] __x64_sys_ioctl at ffffffff9fb54d16
#33 [ffffa070cf8e3f38] do_syscall_64 at ffffffff9f80430b
#34 [ffffa070cf8e3f50] entry_SYSCALL_64_after_hwframe at ffffffffa02000ad
RIP: 00007fa1fff9b6db RSP: 00007fa1f09726e8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 000055904a4b9130 RCX: 00007fa1fff9b6db
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000025
RBP: 0000000000000000 R8: 0000559047e0ef88 R9: 00000000ffffffff
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 0000559047e3f720 R14: 0000000000000000 R15: 00007fa203527000
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
Environment
- Red Hat Enterprise Linux 8
- Kernel older then version 4.18.0-372.16.1.el8_6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.