bnx2x devices do not fail faulty link

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6.3
  • kernels 2.6.32-220.4.2.el6.x86_64, 2.6.32-220.7.1.el6.x86_64

Issue

  • On RHEL 6.3, bnx2x devices do not fail link with large number of rx errors and overruns. In situation where two bnx2x devices, eth0 and eth1, are bonded eth0 encounters a large number of rx errors and overruns, yet, ethtool still shows the link detected; the bond never fails over to eth1 which is not experiencing the rx errors and overruns. The problem was seen in a blade enclosure where some of the systems were a mix of RHEL 5 and RHEL 6. The bonds on RHEL 5 were failing over correctly whereas the RHEL 6 bonds were not.
eth0      Link encap:Ethernet  HWaddr 00:26:55:1B:7C:08  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:69707637 errors:18823718 dropped:0 overruns:18823718 frame:0
          TX packets:17433943 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:12419185456 (11.5 GiB)  TX bytes:10349326962 (9.6 GiB)
          Interrupt:28 Memory:f5000000-f57fffff

bond1     Link encap:Ethernet  HWaddr 00:26:55:1B:7C:08  
          inet6 addr: fe80::226:55ff:fe1b:7c08/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:114978046 errors:18823718 dropped:0 overruns:18823718 frame:0
          TX packets:17433943 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:15902641969 (14.8 GiB)  TX bytes:10349326962 (9.6 GiB)

Settings for eth0:

    Supported ports: [ FIBRE ]
    Supported link modes:   1000baseT/Full 
                            2500baseX/Full 
                            10000baseT/Full 
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Full 
                            2500baseX/Full 
                            10000baseT/Full 
    Advertised pause frame use: Symmetric Receive-only
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: FIBRE
    PHYAD: 16
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: g
    Wake-on: g
    Current message level: 0x00000000 (0)
    Link detected: yes

Settings for eth1:

    Supported ports: [ FIBRE ]
    Supported link modes:   1000baseT/Full 
                            2500baseX/Full 
                            10000baseT/Full 
    Supports auto-negotiation: Yes
    Advertised link modes:  1000baseT/Full 
                            2500baseX/Full 
                            10000baseT/Full 
    Advertised pause frame use: Symmetric Receive-only
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: FIBRE
    PHYAD: 17
    Transceiver: internal
    Auto-negotiation: on
    Supports Wake-on: g
    Wake-on: g
    Current message level: 0x00000000 (0)
    Link detected: yes

Resolution

Update to kernel-2.6.32-358 or later as described in the Errata RHSA-2013:0496-2.

Root Cause

Device never fails and the link is never marked as down, therefore, the bond never fails over to a healthy device.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments