Can Red Hat Enterprise Linux monitor hardware faults through syslog in Linux?

Environment

Red Hat Enterprise Linux (RHEL) 6
Red Hat Enterprise Linux (RHEL) 7

Issue

Can Red Hat Enterprise Linux monitor hardware faults through syslog in Linux?
Is hardware fault management through syslog possible?

Resolution

Here are some options that are available

Error Detection and Correction (EDAC) & Machine Check Exception (MCE) can be monitored using the mcelogd daemon, and with the --syslog option will log events to syslog. For more information please see:

Error Detection and Correction (EDAC) Support available in Red Hat Enterprise Linux
What is mcelog and how can I install it?
Is it necessary to have both EDAC and MCE error reporting modules loaded in the kernel ?
What does the message "kernel: Machine check events logged" mean?

man mcelog

X86  CPUs  report  errors  detected  by the CPU as machine check events
(MCEs).  These can be data corruption detected in the  CPU  caches,  in
main memory by an integrated memory controller, data transfer errors on
the front side bus or CPU interconnect or other internal errors.   Pos-
sible  causes can be cosmic radiation, instable power supplies, cooling
problems, broken hardware, or bad luck.

Most errors can be corrected by the CPU by  internal  error  correction
mechanisms. Uncorrected errors cause machine check exceptions which may
panic the machine.

When a corrected error happens the x86 kernel writes a record  describ-
ing  the  MCE  into  a  internal  ring  buffer  available  through  the
/dev/mcelog device mcelog retrieves errors  from  /dev/mcelog,  decodes
them  into a human readable format and prints them on the standard out-
put or optionally into the system log.

<snip>

When  the  --syslog  option is specified redirect output to system log.
The --syslog-error option causes the normal machine checks to be logged
 as  LOG_ERR  (implies  --syslog  ).  Normally only fatal errors or high
level remarks are logged with error level.  High level  one  line  sum-
maries  of  specific  errors  are  also logged to the syslog by default
unless mcelog operates in --ascii mode.

System Event Log (SEL) can be monitored using the ipmievd daemon. Here is some information on the daemon from man ipmievd

ipmievd  is a daemon which will listen for events from the BMC that are
being sent to the SEL and also log those messages  to  syslog.   It  is
able  to run in one of two modes: either using the Event Message Buffer
and asynchronous event notification from the OpenIPMI kernel driver  or
actively  polling the contents of the SEL for new events.  Upon receipt
of an event via either mechanism it will be logged to syslog  with  the
LOG_LOCAL4 facility.

It  is based on the ipmitool utility and shares the same IPMI interface
support and session setup options.  Please see the ipmitool manpage for
more information on supported IPMI interfaces.

Intelligent Platform Management Interface (IPMI) can also be used to query hardware sensors via the ipmitool utility. This would make it possible to manually monitor these sensors and then trigger a log message via the logger utility. Here is output of ipmitool sensor to show what data is available:

System Temp      | 52.000     | degrees C  | ok    | -9.000    | -7.000    | -5.000    | 75.000    | 77.000    | 79.000    
CPU Temp         | 64.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 85.000    | 90.000    | 95.000    
CPU FAN          | na         | RPM        | na    | na        | na        | na        | na        | na        | na        
SYS FAN          | na         | RPM        | na    | na        | na        | na        | na        | na        | na        
CPU Vcore        | 1.168      | Volts      | ok    | 0.640     | 0.664     | 0.688     | 1.344     | 1.408     | 1.472     
Vnbcore          | 1.056      | Volts      | ok    | 0.808     | 0.824     | 0.840     | 1.160     | 1.176     | 1.192     
+3.3VCC          | 3.312      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712     
VDIMM            | 1.848      | Volts      | ok    | 1.448     | 1.480     | 1.512     | 1.960     | 1.992     | 2.024     
+5 V             | 5.088      | Volts      | ok    | 4.096     | 4.320     | 4.576     | 5.344     | 5.600     | 5.632     
+12 V            | 12.160     | Volts      | ok    | 10.368    | 10.496    | 10.752    | 12.928    | 13.056    | 13.312    
+3.3VSB          | 3.312      | Volts      | ok    | 2.816     | 2.880     | 2.944     | 3.584     | 3.648     | 3.712     
VBAT             | 2.864      | Volts      | ok    | 2.560     | 2.624     | 2.688     | 3.328     | 3.392     | 3.456     
Chassis Intru    | 0x0        | discrete   | 0x0000| na        | na        | na        | na        | na        | na        
PS Status        | 0x1        | discrete   | 0x01ff| na        | na        | na        | na        | na        | na

Hardware Vendor Specific Monitoring is another option for monitoring your hardware if they are available. Please contact your Hardware Vendor for more information.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Can Red Hat Enterprise Linux monitor hardware faults through syslog in Linux?

Environment

Issue

Resolution

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links