perf reliably crashes the system through spurious NMI or causes unknown NMI reports to be received
Issue
- perf record command reliably crashes HP DL Gen9 servers. You can see a trace similar to this on the kernel log after the crash:
Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.
Pid: 7646, comm: java Not tainted 2.6.32-431.3.1.el6.x86_64 #1
Call Trace:
<NMI> [<ffffffff81527213>] ? panic+0xa7/0x16f
[<ffffffff8152cef6>] ? kprobe_exceptions_notify+0x16/0x430
[<ffffffffa002a4df>] ? hpwdt_pretimeout+0x9f/0xcc [hpwdt]
[<ffffffff8152d525>] ? notifier_call_chain+0x55/0x80
[<ffffffff8152d58a>] ? atomic_notifier_call_chain+0x1a/0x20
[<ffffffff810a155e>] ? notify_die+0x2e/0x30
[<ffffffff8152b247>] ? do_nmi+0x217/0x340
[<ffffffff8152aab0>] ? nmi+0x20/0x30
[<ffffffff8103ec0a>] ? native_write_msr_safe+0xa/0x10
<<EOE>> <IRQ> [<ffffffff810229d8>] ? intel_pmu_disable_all+0x38/0x70
[<ffffffff8101ccc2>] ? x86_pmu_disable+0x52/0x60
[<ffffffff8111429b>] ? perf_pmu_disable+0x2b/0x40
[<ffffffff8111a641>] ? perf_adjust_freq_unthr_context+0x61/0x1b0
[<ffffffff8111a847>] ? perf_event_task_tick+0xb7/0x2e0
[<ffffffff8105dd5c>] ? scheduler_tick+0xcc/0x260
[<ffffffff810acdf0>] ? tick_sched_timer+0x0/0xc0
[<ffffffff8108435e>] ? update_process_times+0x6e/0x90
[<ffffffff810ace56>] ? tick_sched_timer+0x66/0xc0
[<ffffffff810ecb5a>] ? __rcu_process_callbacks+0x25a/0x350
[<ffffffff8109f9fe>] ? __run_hrtimer+0x8e/0x1a0
[<ffffffff810a6e1f>] ? ktime_get_update_offsets+0x4f/0xd0
[<ffffffff8109fd66>] ? hrtimer_interrupt+0xe6/0x260
[<ffffffff81031f1d>] ? local_apic_timer_interrupt+0x3d/0x70
[<ffffffff81037159>] ? native_apic_msr_eoi_write+0x19/0x20
[<ffffffff815310b5>] ? smp_apic_timer_interrupt+0x45/0x60
[<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
(...)
- When using 4th Generation Intel Core or Intel Xeon v3 Processor perf counters, through perf or perftop tools, spurious Nonmaskable Interrupts (NMIs) are received, even under moderate load, filling the logs with NMI messages.
- Haswell spurious NMI when using perf counters crashes the system.
Environment
- Red Hat Enterprise Linux 6.5
- System with 4th generation Intel Core/Intel Xeon v3 processor (Haswell micro architecture)
- perf tool
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
