The HP NMI watchdog does not always generate a crash dump

Solution In Progress - Updated -

Issue

In certain cases, the hpwdt driver for the HP NMI watchdog is not able to claim a non-maskable interrupt (NMI) generated by the HPE watchdog timer because the NMI was instead consumed by the perfmon driver.

The missing NMI is initiated by one of two conditions:

1. The Generate NMI button on the Integrated Lights-Out (iLO) server management software. This button is triggered by a user.
2. The hpwdt watchdog. The expiration by default sends an NMI to the server. 

Both sequences typically occur when the system is unresponsive. Under normal circumstances, the NMI handler for both these situations calls the kernel panic() function and if configured, the kdump service generates a vmcore file.

Because of the missing NMI, however, kernel panic() is not called and vmcore is not collected.

In the first case (1.), if the system was unresponsive, it remains so. To work around this scenario, use the virtual Power button to reset or power cycle the server.

In the second case (2.), the missing NMI is followed 9 seconds later by a reset from the Automated System Recovery (ASR).

The HPE Gen9 Server line experiences this problem in single-digit percentages. The Gen10 at an even smaller frequency.

(BZ#1602962)

Environment

Red Hat Enterprise Linux 8

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content