How to inject PCIE AER errors on the software level into a running Linux kernel?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 6

Issue

  • How to inject PCIE AER errors on the software level into a running Linux kernel?

Resolution

ras-utils rpm provides tools to inject errors. (ras-utils is included in RHEL6 Optional channel.)

# rpm -ql ras-utils
/sbin/aer-inject
/sbin/mce-inject
[..]

For example,

Check if the device does have AER support by running lspci -vvv.

04:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection (rev 01)

        Capabilities: [100] Advanced Error Reporting

Then create an aer file with following information.

AER
BUS 4 DEV 0 FN 1   
COR_STATUS BAD_TLP
HEADER_LOG 0 1 2 3

where the numbers in the line of BUS equals xx = BUS number, YY = DEV number, and z = FN number in lspci output.

Load aer-inject module. This will create /dev/aer_inject file.

# modprobe aer-inject

Run following command.

# aer-inject <filename>

It will dump messages like following in /var/log/messages.

kernel: pcieport 0000:00:07.0: AER: Corrected error received: id=0401
kernel: ixgbe 0000:04:00.1: PCIE Bus Error: severity=Corrected, type=Data Link Layer, id=0401(Receiver ID)
kernel: ixgbe 0000:04:00.1:   device [8086:10c6] error status/mask=00000040/00002000
kernel: ixgbe 0000:04:00.1:    [ 6] Bad TLP               

Please note that aer-inject command need not work with all hardware with AER capabilities.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments