Deadlock scenario on CPU runqueue lock between update_blocked_averages() and try_to_wake_up().

Solution Verified - Updated -

Issue

  • Kernel crashes with "hard LOCKUP" while the panicking CPU is waiting for a runqueue spinlock or workqueue spinlock. (See few example backtraces in Diagnostic Steps.)

  • The core of the issue is a CPU in deadlock scenario for it's own runqueue lock between update_blocked_averages() and try_to_wake_up():

PID: 2345   TASK: ffff8b1dfd825000  CPU: 11  COMMAND: "CPU 2/KVM"
 #0 [fffffe000023ae48] crash_nmi_callback at ffffffff9f857853
 #1 [fffffe000023ae50] nmi_handle at ffffffff9f826e03
 #2 [fffffe000023aea8] default_do_nmi at ffffffffa01936b9
 #3 [fffffe000023aec8] do_nmi at ffffffff9f82733f
 #4 [fffffe000023aef0] end_repeat_nmi at ffffffffa02015a4
    [exception RIP: native_queued_spin_lock_slowpath+0x5b]
    RIP: ffffffff9f94e30b  RSP: ffffa070cc838a78  RFLAGS: 00000002
    RAX: 00000000001c0101  RBX: ffff8b3c89905000  RCX: 000000000000000b
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff8b5bffb6ae40
    RBP: ffff8b5bffb6ae40   R8: ffff8b5bffb6a760   R9: ffff8b1d00403930
    R10: 0000000000000000  R11: ffffffffa105b548  R12: 0000000000000000
    R13: ffff8b3c89905bbc  R14: 0000000000000087  R15: 000000000000000b
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffffa070cc838a78] native_queued_spin_lock_slowpath at ffffffff9f94e30b
 #6 [ffffa070cc838a78] _raw_spin_lock at ffffffffa01a667a
 #7 [ffffa070cc838a80] try_to_wake_up at ffffffff9f91fcbd
 #8 [ffffa070cc838ae0] __queue_work at ffffffff9f90aa5d
 #9 [ffffa070cc838b28] queue_work_on at ffffffff9f90ad34
#10 [ffffa070cc838b38] soft_cursor at ffffffff9fd35834
#11 [ffffa070cc838b90] bit_cursor at ffffffff9fd35462
#12 [ffffa070cc838c58] hide_cursor at ffffffff9fdcd22a
#13 [ffffa070cc838c68] vt_console_print at ffffffff9fdcf99d
#14 [ffffa070cc838cd0] console_unlock at ffffffff9f959e1f
#15 [ffffa070cc838d90] vprintk_emit at ffffffff9f95bf1d
#16 [ffffa070cc838de0] printk at ffffffff9f95c504
#17 [ffffa070cc838e40] __warn_printk at ffffffff9f8ed73f
#18 [ffffa070cc838ea8] update_blocked_averages at ffffffff9f92d1cf
#19 [ffffa070cc838f18] run_rebalance_domains at ffffffff9f933e01
#20 [ffffa070cc838f70] __softirqentry_text_start at ffffffffa04000d7
#21 [ffffa070cc838fc0] irq_exit_rcu at ffffffff9f8f3f3b
#22 [ffffa070cc838fd0] irq_exit at ffffffff9f8f3f4a
#23 [ffffa070cc838fd8] smp_apic_timer_interrupt at ffffffffa02026c4
#24 [ffffa070cc838ff0] apic_timer_interrupt at ffffffffa0201c4f
--- <IRQ stack> ---
#25 [ffffa070cf8e3c38] apic_timer_interrupt at ffffffffa0201c4f
    [exception RIP: vmx_do_interrupt_nmi_irqoff+0x26]
    RIP: ffffffffc1b155d6  RSP: ffffa070cf8e3ce0  RFLAGS: 00000082
    RAX: 0000000000001c40  RBX: ffff8b1d08b5aa80  RCX: 0000e5f36127e9a0
    RDX: ffffffff00000000  RSI: 00000000800000ec  RDI: ffffffffa0201c40
    RBP: ffffa070cf8e3ce0   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000001  R15: ffff8b209a6c0200
    ORIG_RAX: ffffffffffffff13  CS: 0010  SS: 0018
#26 [ffffa070cf8e3ce8] vmx_handle_exit_irqoff at ffffffffc1b053a7 [kvm_intel]
#27 [ffffa070cf8e3cf8] vcpu_enter_guest at ffffffffc1024ea6 [kvm]
#28 [ffffa070cf8e3da0] kvm_arch_vcpu_ioctl_run at ffffffffc102840f [kvm]
#29 [ffffa070cf8e3dd0] kvm_vcpu_ioctl at ffffffffc1001738 [kvm]
#30 [ffffa070cf8e3e80] do_vfs_ioctl at ffffffff9fb54694
#31 [ffffa070cf8e3ef8] ksys_ioctl at ffffffff9fb54cd0
#32 [ffffa070cf8e3f30] __x64_sys_ioctl at ffffffff9fb54d16
#33 [ffffa070cf8e3f38] do_syscall_64 at ffffffff9f80430b
#34 [ffffa070cf8e3f50] entry_SYSCALL_64_after_hwframe at ffffffffa02000ad
    RIP: 00007fa1fff9b6db  RSP: 00007fa1f09726e8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000055904a4b9130  RCX: 00007fa1fff9b6db
    RDX: 0000000000000000  RSI: 000000000000ae80  RDI: 0000000000000025
    RBP: 0000000000000000   R8: 0000559047e0ef88   R9: 00000000ffffffff
    R10: 0000000000000001  R11: 0000000000000246  R12: 0000000000000000
    R13: 0000559047e3f720  R14: 0000000000000000  R15: 00007fa203527000
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Environment

  • Red Hat Enterprise Linux 8
  • Kernel older then version 4.18.0-372.16.1.el8_6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content