Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx due to looping in 'sched_cfs_period_timer'
Issue
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx...
Typical stack trace:
<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 16
<4>Pid: 0, comm: swapper Tainted: P --------------- 2.6.32-358.11.1.el6.x86_64 #1
<4>Call Trace:
<4> <NMI> [<ffffffff8150d4f8>] ? panic+0xa7/0x16f
<4> [<ffffffff810e112d>] ? watchdog_overflow_callback+0xcd/0xd0
<4> [<ffffffff81116f30>] ? __perf_event_overflow+0xb0/0x2a0
<4> [<ffffffff8101a89d>] ? x86_perf_event_update+0x5d/0xb0
<4> [<ffffffff8101b82d>] ? x86_perf_event_set_period+0xdd/0x170
<4> [<ffffffff81117554>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810208c2>] ? intel_pmu_handle_irq+0x192/0x300
<4> [<ffffffff815130d6>] ? kprobe_exceptions_notify+0x16/0x430
<4> [<ffffffff81511c49>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff81513705>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8151376a>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff8109cc1e>] ? notify_die+0x2e/0x30
<4> [<ffffffff815113cb>] ? do_nmi+0x1bb/0x340
<4> [<ffffffff81510c90>] ? nmi+0x20/0x30
<4> [<ffffffff810657e1>] ? enqueue_entity+0x1/0x410
<4> <<EOE>> <IRQ> [<ffffffff81065dfb>] ? unthrottle_cfs_rq+0x10b/0x190
<4> [<ffffffffa034e62d>] ? __stp_time_timer_callback+0xbd/0xe0 [stap_13e229648ae4daabe7ad13af6149179_105193]
<4> [<ffffffff81065f2b>] ? distribute_cfs_runtime+0xab/0xd0
<4> [<ffffffff8106614d>] ? sched_cfs_period_timer+0x11d/0x160
<4> [<ffffffff81066030>] ? sched_cfs_period_timer+0x0/0x160
<4> [<ffffffff8109b3ae>] ? __run_hrtimer+0x8e/0x1a0
<4> [<ffffffff810a209f>] ? ktime_get_update_offsets+0x4f/0xd0
<4> [<ffffffff8109b716>] ? hrtimer_interrupt+0xe6/0x260
<4> [<ffffffff815172ab>] ? smp_apic_timer_interrupt+0x6b/0x9b
<4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
<4> <EOI> [<ffffffff812d39fe>] ? intel_idle+0xde/0x170
<4> [<ffffffff812d39e1>] ? intel_idle+0xc1/0x170
<4> [<ffffffff814152d7>] ? cpuidle_idle_call+0xa7/0x140
<4> [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
<4> [<ffffffff8150704c>] ? start_secondary+0x2ac/0x2ef
Statistically the crashes are more frequent on high-power/high-RAM machines with RHEL 6.6.
And on the same hardware 6.6 is ~ 8X more suceptible to this problem compared to RHEL 6.4
Environment
- Red Hat Enterprise Linux (RHEL) 6.6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.