NICs with RX acceleration (GRO/LRO/TPA/etc) may suffer from bad TCP performance

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux versions:

    • Broadcom bnx2x module prior to RHEL 5.7 (kernel earlier than 2.6.18-274.el5)
    • QLogic NetXen netxen_nic module prior to RHEL 5.9 (kernel earlier than 2.6.18-348.el5)
    • Intel 10Gbps ixgbe module prior to RHEL 6.4 (kernel earlier than 2.6.32-358.el6)
    • Intel 10Gbps ixgbe module from RHEL 5.6 (kernel version 2.6.18-238.el5 and later)
  • Receive offloading enabled on network interface

Issue

Network interface cards (NIC) with receive (RX) acceleration (GRO, LRO, TPA, etc) may suffer from bad performance. Some effects include:

  • NFS transfers over 10Gbps links are only transferring at 100MiB/sec (i.e. 1Gbps)
  • TCP connections never reach anywhere near wirespeed
  • In tcpdump, we observed TCP Window clamp down to a small value like 720 bytes and never recover

Resolution

Solution

Upgrade to the following kernel versions:

  • Broadcom bnx2x - RHEL 5.7 kernel-2.6.18-274.el5
  • QLogic NetXen netxen_nic - RHEL 5.9 kernel-2.6.18-348.el5
  • Intel 10Gbps ixgbe - RHEL 6.4 kernel-2.6.32-358.el6
  • There is no resolution on RHEL5 for Intel 10Gbps ixgbe

Workaround

  • Disable GRO/LRO/TPA or other RX/receive accelerations.

  • Other NICs may be handled via ethtool tool with a command such as:

# ethtool -K eth0 gro off
# ethtool -K eth0 lro off
  • Please write an /sbin/ifup-local script to persist the ethtool configurations.

  • For bnx2x, offloading can be controlled by a module option in /etc/modprobe.conf:

options bnx2x disable_tpa=1

A module option such as this requires a module reload or a reboot to apply.

Root Cause

For netxen_nic specifically:

Due to incorrect information provided by firmware, the netxen_nic driver did
not calculate the correct Generic Segmentation Offload (GSO) length of
packets that were received using the Large Receive Offload (LRO)
optimization. This caused network traffic flow to be extensively delayed for
NICs using LRO on netxen_nic, which had a huge impact on NIC's performance
(in some cases, throughput for some 1 GB NICs could be below 100 kbs). With
this update, firmware now provides the correct GSO packet length and the
netxen_nic driver has been modified to handle new information provided by
firmware correctly. Throughput of the NICs using the LRO optimization with
the netxen_nic driver is now within expected levels.

Diagnostic Steps

Two main effects were observed:

  • TCP connection window never increase large enough (staying as low as 720 bytes). Can be checked via general traffic captures.
  • RHEL host may delay 40ms to send an awaited ACK - improper activation of delayed ack mechanism. Observe the TCP connection using wireshark. Open IO Graphs, with 0.1 second per tick and displaying packets/tick. There will be some valleys when delayed ACKs occour.

Check if receive offloading is enabled:

$ grep 'receive-offload' sos_commands/networking/ethtool_-k_eth0 | grep ': on'
generic-receive-offload: on
large-receive-offload: on

Try to disable it and see if issue improves:

# ethtool -K eth0 gro off lro off

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

5 Comments

This article also appears to be relevant to RHEL 4.8.

For virt use-case (i.e. bridging) bz 596385 was fixed in 2.6.18-194.6.1.el5, LRO is automatically disabled on a NIC when it is added to the bridge.

How do we add this parameter in case of RHEL 6 as there is no /etc/modprobe.conf file

This is also necessary for RHEL6. To configure this create your own file in /etc/modprobe.d/ like /etc/modprobe.d/bnx2-TPA-disable.conf.

Is there anything similar for the bnx2 driver? We are seeing some pretty bad performance as a resullt of not being able to disable delayed ack for our iscsi interfaces. It seems VMware and Microsoft have ways of dealing with this, but I haven't found a way to do this with Linux yet.