Red Hat Training

A Red Hat training course is available for RHEL 8

Chapter 31. Using Advanced Error Reporting

When you use the Advanced Error Reporting (AER), you receive notifications of error events for Peripheral Component Interconnect Express (PCIe) devices. RHEL enables this kernel feature by default and collects the reported errors in the kernel logs. Moreover, if you use the rasdaemon program, these errors are parsed and stored in its database.

31.1. Overview of AER

Advanced Error Reporting (AER) is a kernel feature that provides enhanced error reporting for Peripheral Component Interconnect Express (PCIe) devices. The AER kernel driver attaches root ports which support PCIe AER capability in order to:

  • Gather the comprehensive error information
  • Report errors to the users
  • Perform error recovery actions

When AER captures an error, it sends an error message to the console. For a repairable error, the console output is a warning.

Example 31.1. Example AER output

Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=000000c0/00002000
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 6] Bad TLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:    [ 7] Bad DLLP
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID)
Feb  5 15:41:33 hostname kernel: pcieport 10003:00:00.0:   device [8086:2030] error status/mask=00000040/00002000

31.2. Collecting and displaying AER messages

In order to collect and display AER messages, use the rasdaemon program.

Procedure

  1. Install the rasdaemon package.

    # yum install rasdaemon
  2. Enable and start the rasdaemon service.

    # systemctl enable --now rasdaemon
    Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service → /usr/lib/systemd/system/rasdaemon.service.
  3. Issue the ras-mc-ctl command.

    # ras-mc-ctl --summary
    # ras-mc-ctl --errors

    The command displays a summary of the logged errors (the --summary option) or displays the errors stored in the error database (the --errors option).

Additional resources

  • The rasdaemon(8) manual page
  • The ras-mc-ctl(8) manual page