Why is irqbalance not balancing interrupts?
Environment
- Red Hat Enterprise Linux 6
kernel
versionskernel-2.6.32-358.el6
,kernel-2.6.32-358.0.1.el6
orkernel-2.6.32-358.2.1.el6
irqbalance
package versionsirqbalance-1.0.4-3.el6
,irqbalance-1.0.4-4.el6_4
orirqbalance-1.0.4-6.el6
- Multiple CPU cores
irqbalance
service managing interrupts
Issue
- Why is irqbalance not balancing interrupts?
- IRQs are sitting all on one CPU core or two CPU cores.
- In
ethtool -S
andifconfig
output there are packet drops and discards on network interface. -
The following is printed in syslog or
/var/log/messages
:irqbalance: WARNING: MSI interrupts found in /proc/interrupts irqbalance: But none found in sysfs, you need to update your kernel irqbalance: Until then, IRQs will be improperly classified
Resolution
Ensure that a kernel
package later than kernel-2.6.32-358.2.1.el6
is in use.
Ensure that an irqbalance
package later than the following is in use:
- RHEL 6.6: Package
irqbalance-1.0.4-10.el6
on Errata RHBA-2014:1504-1 - RHEL 6.5z: Package
irqbalance-1.0.4-8.el6_5
on Errata RHBA-2014:0096-1 - RHEL 6.4z: Package
irqbalance-1.0.4-6.el6_4
on Errata RHBA-2014:0095-5
Root Cause
- Previously, the irqbalance daemon did not consider the NUMA node assignment for an IRQ (interrupt request) for the banned CPU set. Consequently, irqbalance set the affinity incorrectly when the IRQBALANCE_BANNED_IRQS variable was set to a single CPU. In addition, IRQs could not be assigned to a node that had no eligible CPUs. Node assignment has been restricted to nodes that have eligible CPUs as defined by the unbanned_cpus bitmask, thus fixing the bug. As a result, irqbalance now sets affinity properly, and IRQs are assigned to the respective nodes correctly. (BZ#1054590, BZ#1054591)
- Prior to this update, the dependency of the irqbalance daemon was set incorrectly referring to a wrong kernel version. As a consequence, irqbalance could not balance IRQs on NUMA systems. With this update, the dependency has been fixed, and IRQs are now balanced correctly on NUMA systems. Note that users of irqbalance packages have to update kernel to 2.6.32-358.2.1 or later in order to use the irqbalance daemon in correct manner. (BZ#1055572, BZ#1055574)
- Prior to its latest version, irqbalance could not accurately determine the NUMA node it was local to or the device to which an IRQ was sent. The kernel affinity_hint values were created to work around this issue. With this update, irqbalance is now capable of parsing all information about an IRQ provided by the sysfs() function. IRQ balancing now works correctly, and the affinity_hint values are now ignored by default not to distort the irqbalance functionality. (BZ1093441, BZ1093440)
Diagnostic Steps
-
Interrupts are not balancing:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 59: 1292013110 0 0 0 0 0 PCI-MSI-edge eth0-rxtx-0 60: 851840288 0 0 0 0 0 PCI-MSI-edge eth0-rxtx-1 61: 843207989 0 0 0 0 0 PCI-MSI-edge eth0-rxtx-2 62: 753317489 0 0 0 0 0 PCI-MSI-edge eth0-rxtx-3 $ grep eth /proc/interrupts 71: 2073421 5816340 ...lots of zeroes... PCI-MSI-edge eth11-q0 72: 294863 114392 ...lots of zeroes... PCI-MSI-edge eth11-q1 73: 63206 234005 ...lots of zeroes... PCI-MSI-edge eth11-q2 74: 238342 72189 ...lots of zeroes... PCI-MSI-edge eth11-q3 79: 1491483 699 ...lots of zeroes... PCI-MSI-edge eth9-q0 80: 1 525546 ...lots of zeroes... PCI-MSI-edge eth9-q1 81: 1524075 5 ...lots of zeroes... PCI-MSI-edge eth9-q2 82: 9 1869645 ...lots of zeroes... PCI-MSI-edge eth9-q3
-
The
irqbalance
service is turned on and running:$ chkconfig | grep irqb irqbalance 0:off 1:off 2:off 3:on 4:on 5:on 6:off $ grep irqb ps root 1480 0.0 0.0 10948 668 ? Ss Oct31 4:27 irqbalance
-
There's no additional
irqbalance
config:$ egrep -v "^#" /etc/sysconfig/irqbalance $ grep: /etc/sysconfig/irqbalance: No such file or directory
-
Interrupts are allowed to land on other/all CPU cores:
$ for i in {59..62}; do echo -n "Interrupt $i is allowed on CPUs "; cat /proc/irq/$i/smp_affinity_list; done Interrupt 59 is allowed on CPUs 0-5 Interrupt 60 is allowed on CPUs 0-5 Interrupt 61 is allowed on CPUs 0-5 Interrupt 62 is allowed on CPUs 0-5 $ for i in {71..82}; do echo -n " IRQ $i: "; cat /proc/irq/$i/smp_affinity_list; done IRQ 71: 1,3,5,7,9,11,13,15,17,19,21,23 IRQ 72: 0,2,4,6,8,10,12,14,16,18,20,22 IRQ 73: 1,3,5,7,9,11,13,15,17,19,21,23 IRQ 74: 0,2,4,6,8,10,12,14,16,18,20,22 IRQ 79: 0,2,4,6,8,10,12,14,16,18,20,22 IRQ 80: 1,3,5,7,9,11,13,15,17,19,21,23 IRQ 81: 0,2,4,6,8,10,12,14,16,18,20,22 IRQ 82: 1,3,5,7,9,11,13,15,17,19,21,23
-
Processors do not share cache locality, which stops irqbalance from working by design
$ for i in {0..3}; do for j in {0..7}; do echo -n "cpu$j, index $i: "; cat /sys/devices/system/cpu/cpu$j/cache/index$i/shared_cpu_map; done; done cpu0, index 0: 00000001 cpu1, index 0: 00000002 cpu2, index 0: 00000004 cpu3, index 0: 00000008 cpu4, index 0: 00000010 cpu5, index 0: 00000020 cpu6, index 0: 00000040 cpu7, index 0: 00000080 cpu0, index 1: 00000001 cpu1, index 1: 00000002 cpu2, index 1: 00000004 cpu3, index 1: 00000008 cpu4, index 1: 00000010 cpu5, index 1: 00000020 cpu6, index 1: 00000040 cpu7, index 1: 00000080 cpu0, index 2: 00000001 cpu1, index 2: 00000002 cpu2, index 2: 00000004 cpu3, index 2: 00000008 cpu4, index 2: 00000010 cpu5, index 2: 00000020 cpu6, index 2: 00000040 cpu7, index 2: 00000080 cpu0, index 3: 00000001 cpu1, index 3: 00000002 cpu2, index 3: 00000004 cpu3, index 3: 00000008 cpu4, index 3: 00000010 cpu5, index 3: 00000020 cpu6, index 3: 00000040 cpu7, index 3: 00000080
-
Top users of CPU & MEM
USER %CPU %MEM RSS oracle 204.9% 12.2% 5.34 GiB
-
Oracle instance is uninteruptible sleep & Defunct processes:
USER PID %CPU %MEM VSZ-MiB RSS-MiB TTY STAT START TIME COMMAND oracle 17631 10.5 0.2 260 57 ? Ds 22:10 3:17 ora_j000_INDWS
-
The
irqbalance
service reports "Resource temporarily unavailable" error inlsof -b
:lsof | grep -i irqbalance lsof: avoiding stat(/usr/sbin/irqbalance): -b was specified. irqbalanc 1480 0 txt unknown /usr/sbin/irqbalance (stat: Resource temporarily unavailable) irqbalanc 1480 0 mem REG 8,2 423023 /usr/sbin/irqbalance (stat: Resource temporarily unavailable)
however -b
or the presence of an NFS mount prevent lsof
from running stat()
and instead runs statsafely()
, so this message is expected:
statsafely(path, buf)
char *path; /* file path */
struct stat *buf; /* stat buffer address */
{
if (Fblock) {
if (!Fwarn)
(void) fprintf(stderr,
"%s: avoiding stat(%s): -b was specified.\n",
Pn, path);
errno = EWOULDBLOCK;
return(1);
}
return(doinchild(dostat, path, (char *)buf, sizeof(struct stat)));
}
when the -b
option is specified, Fblock
gets True
. EWOULDBLOCK
(=EAGAIN
) is returned.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments