Are there any software workarounds to incorrect MCE warnings on certain HP Hardware using E7 processors?

Solution Verified - Updated -

Issue

Certain ProLiant DL580 G7 series servers (listed in the Scope section below) utilizing Intel Xeon E7
Family processors may experience Correctable Machine Check (CMC) Memory Errors. These CMC errors are
not operating system-dependent and may occur with any operating system. With high-speed processor
busses, it is normal for a low occurrence of CMC memory error events to occur. However, on rare
occasions, a  higher than expected number of CMC memory errors may be logged in the IA32_MC8_Status
Model-Specific Register (MSR) or IA32_MC9_Status MSR of the processors across a single processor package.

Linux-based operating systems will display events in the "mcelog" output or in the /var/log/mcelog
if that log file exists. Customers will also notice "machine-check event logged" in the "dmesg" output.
Although the errors are benign, these errors may "taint" the Linux kernel.
  • This issue is also seen in AMD based systems as the AMD Opteron(tm) Processor 6174.
  • Is this purely a hardware fault or is there something in software that can relieve this?

Environment

  • Red Hat Enterprise Linux (RHEL) 5
  • Red Hat Enterprise Linux (RHEL) 6
  • HP ProLiant DL580 G7 Server Series with Intel Xeon E7 Family processors
  • HP ProLiant DL980 G7 Server Series with Intel Xeon E7 Family processors
  • HP ProLiant BL680c G7 Server Blade Series with Intel Xeon E7 Family processors

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.