irqbalance does not balance the IRQ correctly if the underlying network device resets

Solution Verified - Updated 2024-06-14T16:46:26+00:00 -

Environment

Red Hat Enterprise Linux (RHEL) 6.9 and earlier
Red Hat Enterprise Linux (RHEL) 7.4 and earlier
irqbalance-1.0.7-8
Network device reset
- Common on Amazon AWS with enavf or ena driver
- Any other NIC driver could produce it with the right conditions

Issue

If an interrupt channel (IRQ) disappears and reappears later (as happens frequently in AWS using the ena driver) the IRQ is not balanced correctly due to overflow in irq_count as the counter got smaller and difference cause overflow.
The issue is very much reproducible when a VM has high network load on Amazon AWS VMs with an enavf device.

The following messages are logged:

kernel: [4293535.378166] ena 0000:00:03.0: eth0: Transmit time out

kernel: [4293551.684567] ena: ena device version: 0.10
kernel: [4293551.686344] ena: ena controller version: 0.0.1 implementation version 1
kernel: [4293553.104073] ena 0000:00:03.0: irq 48 for MSI/MSI-X


kernel: [4293553.104916] ena 0000:00:03.0: irq 56 for MSI/MSI-X
kernel: [4293553.111858] ena 0000:00:03.0: Device reset completed successfully

Resolution

RHEL 7.5: Update to irqbalance-1.0.7-11.el7 released with Errata RHBA-2018:1046.
RHEL 7.4.z: Update to irqbalance-1.0.7-10.el7_4.1 when available.
RHEL 6.10: Update to irqbalance-1.0.7-9.el6 released with Errata RHBA-2018:1896
RHEL 6.9.z: Update to irqbalance-1.0.7-8.el6_9.1 released with Errata RHBA-2018:0514
RHEL 6.7.z EUS: Update to irqbalance-1.0.7-5.el6_7.1 released with Errata RHBA-2018:0495

Root Cause

This issue was fixed in upstream irqbalance with commit 93ed801.

This was backported to RHEL 7.5 with Red Hat Bug 1536373, to RHEL 7.4.z with Red Hat Bug 1542450, to RHEL 6.10 with Red Hat Bug 1536370, to RHEL 6.9.z with Red Hat Bug 1541290, and to RHEL 6.7.z EUS with Red Hat Bug 1541293.

RHEL 7.5: https://bugzilla.redhat.com/show_bug.cgi?id=1536373
RHEL 7.4.z: https://bugzilla.redhat.com/show_bug.cgi?id=1542450
RHEL 6.10: https://bugzilla.redhat.com/show_bug.cgi?id=1536370
RHEL 6.9.z: https://bugzilla.redhat.com/show_bug.cgi?id=1541290
RHEL 6.7.z EUS: https://bugzilla.redhat.com/show_bug.cgi?id=1541293

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

irqbalance does not balance the IRQ correctly if the underlying network device resets

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links