perf reliably crashes the system through spurious NMI or causes unknown NMI reports to be received

Solution Verified - Updated -

Issue

  • perf record command reliably crashes HP DL Gen9 servers. You can see a trace similar to this on the kernel log after the crash:
Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.

Pid: 7646, comm: java Not tainted 2.6.32-431.3.1.el6.x86_64 #1
Call Trace:
 <NMI> [<ffffffff81527213>] ? panic+0xa7/0x16f
 [<ffffffff8152cef6>] ? kprobe_exceptions_notify+0x16/0x430
 [<ffffffffa002a4df>] ? hpwdt_pretimeout+0x9f/0xcc [hpwdt]
 [<ffffffff8152d525>] ? notifier_call_chain+0x55/0x80
 [<ffffffff8152d58a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff810a155e>] ? notify_die+0x2e/0x30
 [<ffffffff8152b247>] ? do_nmi+0x217/0x340
 [<ffffffff8152aab0>] ? nmi+0x20/0x30
 [<ffffffff8103ec0a>] ? native_write_msr_safe+0xa/0x10
 <<EOE>> <IRQ> [<ffffffff810229d8>] ? intel_pmu_disable_all+0x38/0x70
 [<ffffffff8101ccc2>] ? x86_pmu_disable+0x52/0x60
 [<ffffffff8111429b>] ? perf_pmu_disable+0x2b/0x40
 [<ffffffff8111a641>] ? perf_adjust_freq_unthr_context+0x61/0x1b0
 [<ffffffff8111a847>] ? perf_event_task_tick+0xb7/0x2e0
 [<ffffffff8105dd5c>] ? scheduler_tick+0xcc/0x260
 [<ffffffff810acdf0>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8108435e>] ? update_process_times+0x6e/0x90
 [<ffffffff810ace56>] ? tick_sched_timer+0x66/0xc0
 [<ffffffff810ecb5a>] ? __rcu_process_callbacks+0x25a/0x350
 [<ffffffff8109f9fe>] ? __run_hrtimer+0x8e/0x1a0
 [<ffffffff810a6e1f>] ? ktime_get_update_offsets+0x4f/0xd0
 [<ffffffff8109fd66>] ? hrtimer_interrupt+0xe6/0x260
 [<ffffffff81031f1d>] ? local_apic_timer_interrupt+0x3d/0x70
 [<ffffffff81037159>] ? native_apic_msr_eoi_write+0x19/0x20
 [<ffffffff815310b5>] ? smp_apic_timer_interrupt+0x45/0x60
 [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
(...)
  • When using 4th Generation Intel Core or Intel Xeon v3 Processor perf counters, through perf or perftop tools, spurious Nonmaskable Interrupts (NMIs) are received, even under moderate load, filling the logs with NMI messages.
  • Haswell spurious NMI when using perf counters crashes the system.

Environment

  • Red Hat Enterprise Linux 6.5
  • System with 4th generation Intel Core/Intel Xeon v3 processor (Haswell micro architecture)
  • perf tool

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content