Why we have soft lockup in multi_cpu_stop+0x7f in Red Hat Enterprise Linux 7.1?
Issue
- Server hangs with following soft lockup error messages:
May 5 22:40:51 Host1 kernel: BUG: soft lockup - CPU#31 stuck for 23s! [migration/31:242]
May 5 22:41:19 Host1 kernel: BUG: soft lockup - CPU#8 stuck for 22s! [migration/8:115]
- During that time we have CPU 100 in %sys.
- We have soft lockup with following traces in the core
#3 [ffff88123ac03ea0] watchdog_timer_fn at ffffffff8110a4f5
#4 [ffff88123ac03ed0] __run_hrtimer at ffffffff8109b1a7
#5 [ffff88123ac03f10] hrtimer_interrupt at ffffffff8109b9e7
#6 [ffff88123ac03f80] local_apic_timer_interrupt at ffffffff810441c7
#7 [ffff88123ac03f98] smp_apic_timer_interrupt at ffffffff8161634f
#8 [ffff88123ac03fb0] apic_timer_interrupt at ffffffff81614a1d
--- <IRQ stack> ---
#9 [ffff881238847ce8] apic_timer_interrupt at ffffffff81614a1d
[exception RIP: multi_cpu_stop+0x7f]
RIP: ffffffff810f26df RSP: ffff881238847d90 RFLAGS: 00000293
RAX: ffffffff81633ce0 RBX: ffff881238847d20 RCX: dead000000200200
RDX: 0000000000000001 RSI: 0000000000000282 RDI: ffff880663277af0
RBP: ffff881238847db0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000001 R11: 0000000000000008 R12: 000000000000001a
R13: 000000000000001f R14: ffff8814bfd13680 R15: ffff8814bfd13680
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0000
#10 [ffff881238847db8] cpu_stopper_thread at ffffffff810f28e8
#11 [ffff881238847e80] smpboot_thread_fn at ffffffff8109fc7f
#12 [ffff881238847ec8] kthread at ffffffff8109726f
#13 [ffff881238847f50] ret_from_fork at ffffffff81613cfc
Environment
- Red Hat Enterprise Linux 7.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.