Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx due to looping in 'sched_cfs_period_timer'

Solution Verified - Updated -

Issue

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx...

Typical stack trace:

<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 16
<4>Pid: 0, comm: swapper Tainted: P           ---------------    2.6.32-358.11.1.el6.x86_64 #1
<4>Call Trace:
<4> <NMI>  [<ffffffff8150d4f8>] ? panic+0xa7/0x16f
<4> [<ffffffff810e112d>] ? watchdog_overflow_callback+0xcd/0xd0
<4> [<ffffffff81116f30>] ? __perf_event_overflow+0xb0/0x2a0
<4> [<ffffffff8101a89d>] ? x86_perf_event_update+0x5d/0xb0
<4> [<ffffffff8101b82d>] ? x86_perf_event_set_period+0xdd/0x170
<4> [<ffffffff81117554>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810208c2>] ? intel_pmu_handle_irq+0x192/0x300
<4> [<ffffffff815130d6>] ? kprobe_exceptions_notify+0x16/0x430
<4> [<ffffffff81511c49>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff81513705>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8151376a>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff8109cc1e>] ? notify_die+0x2e/0x30
<4> [<ffffffff815113cb>] ? do_nmi+0x1bb/0x340
<4> [<ffffffff81510c90>] ? nmi+0x20/0x30
<4> [<ffffffff810657e1>] ? enqueue_entity+0x1/0x410
<4> <<EOE>>  <IRQ>  [<ffffffff81065dfb>] ? unthrottle_cfs_rq+0x10b/0x190
<4> [<ffffffffa034e62d>] ? __stp_time_timer_callback+0xbd/0xe0 [stap_13e229648ae4daabe7ad13af6149179_105193]
<4> [<ffffffff81065f2b>] ? distribute_cfs_runtime+0xab/0xd0
<4> [<ffffffff8106614d>] ? sched_cfs_period_timer+0x11d/0x160
<4> [<ffffffff81066030>] ? sched_cfs_period_timer+0x0/0x160
<4> [<ffffffff8109b3ae>] ? __run_hrtimer+0x8e/0x1a0
<4> [<ffffffff810a209f>] ? ktime_get_update_offsets+0x4f/0xd0
<4> [<ffffffff8109b716>] ? hrtimer_interrupt+0xe6/0x260
<4> [<ffffffff815172ab>] ? smp_apic_timer_interrupt+0x6b/0x9b
<4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
<4> <EOI>  [<ffffffff812d39fe>] ? intel_idle+0xde/0x170
<4> [<ffffffff812d39e1>] ? intel_idle+0xc1/0x170
<4> [<ffffffff814152d7>] ? cpuidle_idle_call+0xa7/0x140
<4> [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
<4> [<ffffffff8150704c>] ? start_secondary+0x2ac/0x2ef

Statistically the crashes are more frequent on high-power/high-RAM machines with RHEL 6.6.
And on the same hardware 6.6 is ~ 8X more suceptible to this problem compared to RHEL 6.4

Environment

  • Red Hat Enterprise Linux (RHEL) 6.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content