NMI messages and MCEs on HP ProLiant DL585 G5/G6 or DL685 G5
Issue
-
NMI messages are being received on HP ProLiant DL585 G6 systems:
Jan 12 22:47:05 example kernel: Uhhuh. NMI received for unknown reason 30. Jan 12 22:47:05 example kernel: Dazed and confused, but trying to continue Jan 12 22:47:05 example kernel: Do you have a strange power saving mode enabled? -
System rebooting or hanging without generating core dump following above NMI errors
-
System generates a Machine-Check Exception (MCE) referencing bank 4 (indicating Northbridge or DRAM on AMD processors):
CPU 1: Machine Check Exception: 4 Bank 4: ba00000000070f0f
TSC 622520147de MISC e00c0ffe01000000
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 4 TSC 622520147de [at 1867 Mhz 0 days 1:0:13 uptime (unreliable)]
MISC e00c0ffe01000000
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
Processor context corrupt
MCA: BUS Level-3 Generic Generic Other-transaction Request-no-timeout Error
<16:7> BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS ba00000000070f0f MCGSTATUS 4
Environment
- Red Hat Enterprise Linux (RHEL) 4
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
- HP ProLiant DL585 G5 or G6
- HP ProLiant DL685 G5
- Broadcom NetXtreme II 5709 NIC
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
