Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx due to looping in 'sched_cfs_period_timer'

Solution Verified - Updated 2024-08-05T05:21:18+00:00 -

Issue

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx...

Typical stack trace:

<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 16
<4>Pid: 0, comm: swapper Tainted: P           ---------------    2.6.32-358.11.1.el6.x86_64 #1
<4>Call Trace:
<4> <NMI>  [<ffffffff8150d4f8>] ? panic+0xa7/0x16f
<4> [<ffffffff810e112d>] ? watchdog_overflow_callback+0xcd/0xd0
<4> [<ffffffff81116f30>] ? __perf_event_overflow+0xb0/0x2a0
<4> [<ffffffff8101a89d>] ? x86_perf_event_update+0x5d/0xb0
<4> [<ffffffff8101b82d>] ? x86_perf_event_set_period+0xdd/0x170
<4> [<ffffffff81117554>] ? perf_event_overflow+0x14/0x20
<4> [<ffffffff810208c2>] ? intel_pmu_handle_irq+0x192/0x300
<4> [<ffffffff815130d6>] ? kprobe_exceptions_notify+0x16/0x430
<4> [<ffffffff81511c49>] ? perf_event_nmi_handler+0x39/0xb0
<4> [<ffffffff81513705>] ? notifier_call_chain+0x55/0x80
<4> [<ffffffff8151376a>] ? atomic_notifier_call_chain+0x1a/0x20
<4> [<ffffffff8109cc1e>] ? notify_die+0x2e/0x30
<4> [<ffffffff815113cb>] ? do_nmi+0x1bb/0x340
<4> [<ffffffff81510c90>] ? nmi+0x20/0x30
<4> [<ffffffff810657e1>] ? enqueue_entity+0x1/0x410
<4> <<EOE>>  <IRQ>  [<ffffffff81065dfb>] ? unthrottle_cfs_rq+0x10b/0x190
<4> [<ffffffffa034e62d>] ? __stp_time_timer_callback+0xbd/0xe0 [stap_13e229648ae4daabe7ad13af6149179_105193]
<4> [<ffffffff81065f2b>] ? distribute_cfs_runtime+0xab/0xd0
<4> [<ffffffff8106614d>] ? sched_cfs_period_timer+0x11d/0x160
<4> [<ffffffff81066030>] ? sched_cfs_period_timer+0x0/0x160
<4> [<ffffffff8109b3ae>] ? __run_hrtimer+0x8e/0x1a0
<4> [<ffffffff810a209f>] ? ktime_get_update_offsets+0x4f/0xd0
<4> [<ffffffff8109b716>] ? hrtimer_interrupt+0xe6/0x260
<4> [<ffffffff815172ab>] ? smp_apic_timer_interrupt+0x6b/0x9b
<4> [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
<4> <EOI>  [<ffffffff812d39fe>] ? intel_idle+0xde/0x170
<4> [<ffffffff812d39e1>] ? intel_idle+0xc1/0x170
<4> [<ffffffff814152d7>] ? cpuidle_idle_call+0xa7/0x140
<4> [<ffffffff81009fc6>] ? cpu_idle+0xb6/0x110
<4> [<ffffffff8150704c>] ? start_secondary+0x2ac/0x2ef

Statistically the crashes are more frequent on high-power/high-RAM machines with RHEL 6.6.
And on the same hardware 6.6 is ~ 8X more suceptible to this problem compared to RHEL 6.4

Environment

Red Hat Enterprise Linux (RHEL) 6.6

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu xx due to looping in 'sched_cfs_period_timer'

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links