Kernel crashes when using jumbo frames with Broadcom NICs in Red Hat Enterprise Linux
Environment
-
Systems with Ethernet interfaces that require the
bnx2(Broadcom NetXtreme II BCM5706/5708/5709/5716) driver -
Ethernet networking with jumbo frames (MTU larger than 1500)
- Red Hat Enterprise Linux 4.8, or Red Hat Enterprise Linux 5 prior to 5.4.
Issue
When using Broadcom ethernet cards which require the bnx2 driver in combination with jumbo ethernet frames (i.e. with an MTU larger than 1500), kernel crashes may occur with varying backtraces which at first sight may not always be clearly associated with a network card driver issue.
Resolution
Red Hat Enterprise Linux 5
This issue has been fixed as of the 5.4 GA kernel, kernel version 2.6.18-164, through an update of the bnx2 driver version. This update includes fixes for the driver's memory handling, in particular during error handling for jumbo frames.
Red Hat Enterprise Linux 4
This issue has been fixed as of the 4.9 GA kernel, kernel version 2.6.9-89.100. Quoting the Red Hat Enterprise Linux 4.9 Release Notes:
BZ#523691Using the bnx2 driver under heavy NFS usage could have caused the kernel to panic; this was traced back to the poll_freewait() function. The following message was received prior to the kernel panic:RPC: Invalid TCP record fragment lengthThis update resolves this issue by updating the 5706/5708 firmware, eliminating TSO header modifications, and fixing jumbo frame error handling, with the result that the bnx2 driver no longer has to modify TCP/IP header fields when transmitting TCP Segmentation Offload (TSO) packets.
Workarounds
When updating to the Red Hat Enterprise Linux 5.4 (or newer) kernel is not an option, this issue can be worked around in a number of ways:
- By disabling the use of jumbo frames for those ethernet interfaces that are driven by the
bnx2driver.
Note: this may impact the system's performance. - By updating the firmware of the Broadcom NIC using tools provided by your system's hardware vendor.
Note: this workaround is only available for some systems and may not be fully effective. -
By disabling TCP Segmentation Offloading (TSO) for those ethernet interfaces that are driven by the
bnx2driver throughethtool -K ethN tso off
Note: this workaround is known to work in some, but not all affected environments.
* By using a different network card which does not require the bnx2 driver.
Root Cause
The root cause of this issue is a defect in the bnx2 driver and its embedded firmware which is remedied through a firmware upgrade and two upstream patches, one to eliminate TSO header modifications (a1efb4b686babf38e5e63add8b990f18e38becc4) and one to fix the error handling for jumbo frames (990ec3804bb9fd37fcce3e165c95e8b79a783aa3).
Comments
The incorrect memory handling by the bnx2 driver in affected versions is known to cause corruption in the kernel's page tables which accounts for the variety of backtraces that have been seen on affected systems.
The bnx2 driver update in Red Hat Enterprise Linux 5.4 is discussed in the Red Hat Enterprise Linux 5.4 Technical Notes - Network Driver Updates.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments