irqbalance does not balance the IRQ correctly if the underlying network device resets
Environment
- Red Hat Enterprise Linux (RHEL) 6.9 and earlier
- Red Hat Enterprise Linux (RHEL) 7.4 and earlier
irqbalance-1.0.7-8
- Network device reset
- Common on Amazon AWS with
enavf
orena
driver - Any other NIC driver could produce it with the right conditions
- Common on Amazon AWS with
Issue
- If an interrupt channel (IRQ) disappears and reappears later (as happens frequently in AWS using the
ena
driver) the IRQ is not balanced correctly due to overflow inirq_count
as the counter got smaller and difference cause overflow. - The issue is very much reproducible when a VM has high network load on Amazon AWS VMs with an
enavf
device. -
The following messages are logged:
kernel: [4293535.378166] ena 0000:00:03.0: eth0: Transmit time out kernel: [4293551.684567] ena: ena device version: 0.10 kernel: [4293551.686344] ena: ena controller version: 0.0.1 implementation version 1 kernel: [4293553.104073] ena 0000:00:03.0: irq 48 for MSI/MSI-X kernel: [4293553.104916] ena 0000:00:03.0: irq 56 for MSI/MSI-X kernel: [4293553.111858] ena 0000:00:03.0: Device reset completed successfully
Resolution
- RHEL 7.5: Update to
irqbalance-1.0.7-11.el7
released with Errata RHBA-2018:1046. - RHEL 7.4.z: Update to
irqbalance-1.0.7-10.el7_4.1
when available. - RHEL 6.10: Update to
irqbalance-1.0.7-9.el6
released with Errata RHBA-2018:1896 - RHEL 6.9.z: Update to
irqbalance-1.0.7-8.el6_9.1
released with Errata RHBA-2018:0514 - RHEL 6.7.z EUS: Update to
irqbalance-1.0.7-5.el6_7.1
released with Errata RHBA-2018:0495
Root Cause
This issue was fixed in upstream irqbalance with commit 93ed801.
This was backported to RHEL 7.5 with Red Hat Bug 1536373, to RHEL 7.4.z with Red Hat Bug 1542450, to RHEL 6.10 with Red Hat Bug 1536370, to RHEL 6.9.z with Red Hat Bug 1541290, and to RHEL 6.7.z EUS with Red Hat Bug 1541293.
- RHEL 7.5: https://bugzilla.redhat.com/show_bug.cgi?id=1536373
- RHEL 7.4.z: https://bugzilla.redhat.com/show_bug.cgi?id=1542450
- RHEL 6.10: https://bugzilla.redhat.com/show_bug.cgi?id=1536370
- RHEL 6.9.z: https://bugzilla.redhat.com/show_bug.cgi?id=1541290
- RHEL 6.7.z EUS: https://bugzilla.redhat.com/show_bug.cgi?id=1541293
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments