NMI messages and MCEs on HP ProLiant DL585 G5/G6 or DL685 G5
Issue
-
NMI messages are being received on HP ProLiant DL585 G6 systems:
Jan 12 22:47:05 example kernel: Uhhuh. NMI received for unknown reason 30. Jan 12 22:47:05 example kernel: Dazed and confused, but trying to continue Jan 12 22:47:05 example kernel: Do you have a strange power saving mode enabled?
-
System rebooting or hanging without generating core dump following above NMI errors
-
System generates a Machine-Check Exception (MCE) referencing bank 4 (indicating Northbridge or DRAM on AMD processors):
CPU 1: Machine Check Exception: 4 Bank 4: ba00000000070f0f
TSC 622520147de MISC e00c0ffe01000000
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 4 TSC 622520147de [at 1867 Mhz 0 days 1:0:13 uptime (unreliable)]
MISC e00c0ffe01000000
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
Processor context corrupt
MCA: BUS Level-3 Generic Generic Other-transaction Request-no-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS ba00000000070f0f MCGSTATUS 4
Environment
- Red Hat Enterprise Linux (RHEL) 4
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
- HP ProLiant DL585 G5 or G6
- HP ProLiant DL685 G5
- Broadcom NetXtreme II 5709 NIC
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.