Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx due to looping in 'sched_cfs_period_timer'

Solution Verified - Updated -

Issue

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx...

Typical stack trace:

<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 16
<4>Pid: 0, comm: swapper Tainted: P           ---------------    2.6.32-358.11.1.el6.x86_64 #1
<4>Call Trace:
<4> <NMI>  [<ffffffff8150d4f8>] ? panic+0xa7/0x16f
<4> [<ffffffff810e112d>] ? watchdog_overflow_callback+0xcd/0xd0
<4> [<ffffffff81116f30>] ? __perf_event_overflow+0xb0/0x2a0
<4> [<ffffffff8101a89d>] ? x86_perf_event_update+0x5d/0xb0
<4> [<ffffffff8101b82d>] ? x86_perf_event_set_period+0xdd/0x170
<4> [<ffffffff81117554>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810208c2>] ? intel_pmu_handle_irq+0x192/0x300
<4> [<ffffffff815130d6>] ? kprobe_exceptions_notify+0x16/0x430
<4> [<ffffffff81511c49>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff81513705>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8151376a>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff8109cc1e>] ? notify_die+0x2e/0x30
<4> [<ffffffff815113cb>] ? do_nmi+0x1bb/0x340
<4> [<ffffffff81510c90>] ? nmi+0x20/0x30
<4> [<ffffffff810657e1>] ? enqueue_entity+0x1/0x410
<4> <<EOE>>  <IRQ>  [<ffffffff81065dfb>] ? unthrottle_cfs_rq+0x10b/0x190
<4> [<ffffffffa034e62d>] ? __stp_time_timer_callback+0xbd/0xe0 [stap_13e229648ae4daabe7ad13af6149179_105193]
<4> [<ffffffff81065f2b>] ? distribute_cfs_runtime+0xab/0xd0
<4> [<ffffffff8106614d>] ? sched_cfs_period_timer+0x11d/0x160
<4> [<ffffffff81066030>] ? sched_cfs_period_timer+0x0/0x160
<4> [<ffffffff8109b3ae>] ? __run_hrtimer+0x8e/0x1a0
<4> [<ffffffff810a209f>] ? ktime_get_update_offsets+0x4f/0xd0
<4> [<ffffffff8109b716>] ? hrtimer_interrupt+0xe6/0x260
<4> [<ffffffff815172ab>] ? smp_apic_timer_interrupt+0x6b/0x9b
<4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
<4> <EOI>  [<ffffffff812d39fe>] ? intel_idle+0xde/0x170
<4> [<ffffffff812d39e1>] ? intel_idle+0xc1/0x170
<4> [<ffffffff814152d7>] ? cpuidle_idle_call+0xa7/0x140
<4> [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
<4> [<ffffffff8150704c>] ? start_secondary+0x2ac/0x2ef

Statistically the crashes are more frequent on high-power/high-RAM machines with RHEL 6.6.
And on the same hardware 6.6 is ~ 8X more suceptible to this problem compared to RHEL 6.4

Environment

  • Red Hat Enterprise Linux (RHEL) 6.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.