"do_IRQ: X.Y No irq handler for vector (irq -1)" ixgbe intel X550 NIC

Latest response

Similar to this issue: https://access.redhat.com/solutions/110053 ...

... this issue has been occurring on RHEL 7.9 since Kernel 3.10.0-1160.24 and continues on 160.31, with SR-IOV enabled. The issue is in orbit about the ixgbe driver (dual X550 NICs). Multiple ssh sessions are active, with impacts ranging from jumbled output due the kernel messages, broken pipes and ssh disconnects.

CPU is AMD EPYC-7302, MB is Asrock EPYCD8-2T, BIOS v. 2.60 (current) (required to correct the issue with grub2/shim).

Responses

MSI/MSI-X assignments

<br />[    4.610983] ixgbe 0000:42:00.0: irq 131 for MSI/MSI-X
[    4.610991] ixgbe 0000:42:00.0: irq 132 for MSI/MSI-X
[    4.610996] ixgbe 0000:42:00.0: irq 133 for MSI/MSI-X
[    4.611000] ixgbe 0000:42:00.0: irq 134 for MSI/MSI-X
[    4.611005] ixgbe 0000:42:00.0: irq 135 for MSI/MSI-X
[    4.611010] ixgbe 0000:42:00.0: irq 136 for MSI/MSI-X
[    4.611015] ixgbe 0000:42:00.0: irq 137 for MSI/MSI-X
[    4.611022] ixgbe 0000:42:00.0: irq 138 for MSI/MSI-X
[    4.611026] ixgbe 0000:42:00.0: irq 139 for MSI/MSI-X
[    4.611030] ixgbe 0000:42:00.0: irq 140 for MSI/MSI-X
[    4.611033] ixgbe 0000:42:00.0: irq 141 for MSI/MSI-X
[    4.611036] ixgbe 0000:42:00.0: irq 142 for MSI/MSI-X
[    4.611041] ixgbe 0000:42:00.0: irq 143 for MSI/MSI-X
[    4.611044] ixgbe 0000:42:00.0: irq 144 for MSI/MSI-X
[    4.611047] ixgbe 0000:42:00.0: irq 145 for MSI/MSI-X
[    4.611052] ixgbe 0000:42:00.0: irq 146 for MSI/MSI-X
[    4.611055] ixgbe 0000:42:00.0: irq 147 for MSI/MSI-X
[    5.680695] ixgbe 0000:42:00.1: irq 149 for MSI/MSI-X
[    5.680704] ixgbe 0000:42:00.1: irq 150 for MSI/MSI-X
[    5.680711] ixgbe 0000:42:00.1: irq 151 for MSI/MSI-X
[    5.680718] ixgbe 0000:42:00.1: irq 152 for MSI/MSI-X
[    5.680725] ixgbe 0000:42:00.1: irq 153 for MSI/MSI-X
[    5.680733] ixgbe 0000:42:00.1: irq 154 for MSI/MSI-X
[    5.680741] ixgbe 0000:42:00.1: irq 155 for MSI/MSI-X
[    5.680747] ixgbe 0000:42:00.1: irq 156 for MSI/MSI-X
[    5.680755] ixgbe 0000:42:00.1: irq 157 for MSI/MSI-X
[    5.680761] ixgbe 0000:42:00.1: irq 158 for MSI/MSI-X
[    5.680767] ixgbe 0000:42:00.1: irq 159 for MSI/MSI-X
[    5.680773] ixgbe 0000:42:00.1: irq 160 for MSI/MSI-X
[    5.680779] ixgbe 0000:42:00.1: irq 161 for MSI/MSI-X
[    5.680785] ixgbe 0000:42:00.1: irq 162 for MSI/MSI-X
[    5.680793] ixgbe 0000:42:00.1: irq 163 for MSI/MSI-X
[    5.680799] ixgbe 0000:42:00.1: irq 164 for MSI/MSI-X
[    5.680805] ixgbe 0000:42:00.1: irq 165 for MSI/MSI-X

Events 1-6

[125854.223448] do_IRQ: 11.158 No irq handler for vector (irq -1)
[196961.961127] do_IRQ: 0.172 No irq handler for vector (irq -1)
[294403.918076] do_IRQ: 6.184 No irq handler for vector (irq -1)
[316673.912452] do_IRQ: 10.138 No irq handler for vector (irq -1)
[336752.116329] do_IRQ: 9.160 No irq handler for vector (irq -1)

dmesg summary (tl;dr version)

Six events have occurred, starting 1.45 days after startup (SR-IOV enabled in BIOS). The 196961 and 294403 events impacted ssh traffic (broken pipe and ssh disconnect, respectively). Both of these events have vectors (172 & 184) outside the semi-sparse range of 131-165 which was allocated for ixgbe MSI/MSI-X 5.68 seconds after startup.

So, why would the generated interrupt be outside the set of vectors reserved for ixgbe MSI/MSI-X?

Event #6 - 1 broken ssh connection

[417009.997136] do_IRQ: 11.113 No irq handler for vector (irq -1)

Events #7 - 11 - terminal refresh required

[418974.144133] do_IRQ: 10.56 No irq handler for vector (irq -1)
[423791.904227] do_IRQ: 8.206 No irq handler for vector (irq -1)
[515232.627049] do_IRQ: 18.154 No irq handler for vector (irq -1)
[515575.977574] do_IRQ: 10.218 No irq handler for vector (irq -1)
[628572.962615] do_IRQ: 9.168 No irq handler for vector (irq -1)

notes: Inconsistent timing, very odd vales. Jest feels like l-value before r-value to me.

Action outside the scope of the RHEL Server has been taken, and stable operation has been restored after a restart. Appropriate feedback has been provided. Should you experience similar issues, please feel free to reach out by posting to this discussion. It may take some time for the issue to be properly addressed.