Show Table of Contents
20.6. Checking for Hardware Errors
Red Hat Enterprise Linux 7 introduced the new hardware event report mechanism (HERM.) This mechanism gathers system-reported memory errors as well as errors reported by the error detection and correction (EDAC) mechanism for dual in-line memory modules (DIMMs) and reports them to user space. The user-space daemon
rasdaemon, catches and handles all reliability, availability, and serviceability (RAS) error events that come from the kernel tracing mechanism, and logs them. The functions previously provided by edac-utils are now replaced by rasdaemon.
To install
rasdaemon, enter the following command as root:
~]# yum install rasdaemon
Start the service as follows:
~]# systemctl start rasdaemon
To make the service run at system start, enter the following command:
~]# systemctl enable rasdaemon
The
ras-mc-ctl utility provides a means to work with EDAC drivers. Enter the following command to see a list of command options:
~]$ ras-mc-ctl --help
Usage: ras-mc-ctl [OPTIONS...]
--quiet Quiet operation.
--mainboard Print mainboard vendor and model for this hardware.
--status Print status of EDAC drivers.
output truncated
To view a summary of memory controller events, run as
root:
~]# ras-mc-ctl --summary
Memory controller events summary:
Corrected on DIMM Label(s): 'CPU_SrcID#0_Ha#0_Chan#0_DIMM#0' location: 0:0:0:-1 errors: 1
No PCIe AER errors.
No Extlog errors.
MCE records summary:
1 MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error errors
2 No Error errors
To view a list of errors reported by the memory controller, run as
root:
~]# ras-mc-ctl --errors
Memory controller events:
1 3172-02-17 00:47:01 -0500 1 Corrected error(s): memory read error at CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 location: 0:0:0:-1, addr 65928, grain 7, syndrome 0 area:DRAM err_code:0001:0090 socket:0 ha:0 channel_mask:1 rank:0
No PCIe AER errors.
No Extlog errors.
MCE events:
1 3171-11-09 06:20:21 -0500 error: MEMORY CONTROLLER RD_CHANNEL0_ERR Transaction: Memory read error, mcg mcgstatus=0, mci Corrected_error, n_errors=1, mcgcap=0x01000c16, status=0x8c00004000010090, addr=0x1018893000, misc=0x15020a086, walltime=0x57e96780, cpuid=0x00050663, bank=0x00000007
2 3205-06-22 00:13:41 -0400 error: No Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x01000c16, status=0x9400000000000000, addr=0x0000abcd, walltime=0x57e967ea, cpuid=0x00050663, bank=0x00000001
3 3205-06-22 00:13:41 -0400 error: No Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x01000c16, status=0x9400000000000000, addr=0x00001234, walltime=0x57e967ea, cpu=0x00000001, cpuid=0x00050663, apicid=0x00000002, bank=0x00000002
These commands are also described in the
ras-mc-ctl(8) manual page.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.