RHEL Bonding Issue - No consistant checking for a failed link

Latest response

Hi Guys

We have got 16 HP DL320e Gen8 servers running with 64bit RHEL 6.4. Each of these servers have got 4 NIC ports. 2 on-board and 2 on a extended adapter. The network interfaces are named as follows:

on-board NIC ports = em1, em2
extended NIC ports = p2p1, p2p2

We have created 2 bonds.:

Bond0 = em1 & p2p1
Bond1 = em2 & p2p2

The drivers for these NICs are updated to the latest(18th Feb 2014) available from HP.

However, we are noticing some sort of inconsistencies with the em1 and em2 interfaces. They are not consistently detecting that a link has failed.

All the em interfaces connect to switch 11 and all the p2px interfaces connect to switch 12. When switch 11 loses power, some of the interfaces do not detect the failure. The specific interfaces that have this issue are random. Network Manager service has been disabled. The configuration files are attached.

Could some experts from the community throw some sort of assistance or help to resolve this issue we are encountering?
Looking forward to hearing from you.

Regards
Jo V

Attachments

Responses

Hi Jo,

I am unable to see the attachments, I do not know if they are relevant to the issue at hand.
What type of bonding mode do you use?
load balancing, failover or another mode.

Have you checked with
ethool em1
or
ethtool em2

whether the TCP stack detects the switch failure?

are the NICs directly attached to the switch 11 or are there other network components involved that may fake a carrier detect.

Kind regards,

Jan Gerrit

Hi Jan

I have updated the file for your reference. The em1 & em2 connections go to switch 11 and p2p1 & p2p2 connections go to switch 12.

How do we know if TCP stack detects the switch failure?

Regards
Jo

Hey Jo,
There are a number of additional questions I would want to ask, but I will start with:
(Jan - the image indicates Jo is using Mode=1)

In the following example, em1 and p3p1 are bonded - however, p3p1 is actually not connected

# ethtool em1
Settings for em1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: No
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: No
    Advertised auto-negotiation: Yes
    Speed: 1000Mb/s
    Duplex: Full
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: g
    Wake-on: d
    Link detected: yes
# ethtool p3p1
Settings for p3p1:
    Supported ports: [ TP ]
    Supported link modes:   10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Supported pause frame use: Symmetric
    Supports auto-negotiation: Yes
    Advertised link modes:  10baseT/Half 10baseT/Full 
                            100baseT/Half 100baseT/Full 
                            1000baseT/Full 
    Advertised pause frame use: Symmetric
    Advertised auto-negotiation: Yes
    Speed: Unknown!
    Duplex: Unknown! (255)
    Port: Twisted Pair
    PHYAD: 1
    Transceiver: internal
    Auto-negotiation: on
    MDI-X: Unknown
    Supports Wake-on: pumbg
    Wake-on: g
    Current message level: 0x00000007 (7)
                   drv probe link
    Link detected: no
# mii-tool em1
em1: negotiated 100baseTx-FD, link ok
# mii-tool p3p1
p3p1: no link
# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: em1 (primary_reselect always)
Currently Active Slave: em1
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: em1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d4:ae:52:64:6e:0e
Slave queue ID: 0

Slave Interface: p3p1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 90:e2:ba:02:96:bc
Slave queue ID: 0

Hi James

I have added the output of the following for your reference:

cat /proc/net/bonding/bond0

ethtool em1
ethtool em2
ethtool p2p1
ethtool p2p2

Hi Jo,

It looks like James and Jan have some good tips for you...

One random thought before the remainder of this post, Do you use jumbo frames in your enviornment? Is there any additional configuration for mode 1 active-backup on the switch?

All the nic boding we've used in my environment (typically on oracle, but some others) is IEEE 802.3ad (LACP) dynamic link aggregation, mode-4, and on that (mode-4), it certainly requires different switch configuration than the mode-1 you mention. Not sure if additional switch configuration is needed for mode-1.

On the switch, do you see the MAC of the bond interfaces? (and only the bonds)?

One source recommended checking what's in /proc/ for the following:

cat /proc/net/bonding/bond0
cat /proc/net/bonding/bond1
cat /proc/net/bonding/em1
cat /proc/net/bonding/em2
cat /proc/net/bonding/p2p1
cat /proc/net/bonding/p2p2
/sbin/ip addr sho 

Perhaps check the below as you ifup/ifdown the sub interfaces
*[(see this link from Sept 14th, 2013)}(http://www.tecmint.com/ethernet-channel-bonding-aka-nic-teaming-on-linux-systems/), and scroll down to 'Create active backup'

watch -n .1 cat /proc/net/bonding/bond0
watch -n .1 cat /proc/net/bonding/bond1
echo "in another terminal..."
ifdown em1;sleep 1;ifup em1
ifdown em1;sleep 2;ifup em1
echo and so forth with others
echo and watch the output

Does that output show anything relevant for mii status, a polling interval or any sort of link failure?

  • Also, according to this, bit from Red Hat, with version 2.6.2, arps occur with failover. Is there a way for you to check for that 'arping' (see quote below)

begin quote

"In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly-active slave. One gratuitous ARP is issued for the bonding master interface and each VLAN interface configured above it, assuming that the interface has at least one IP address configured. Gratuitous ARPs issued for VLAN interfaces are tagged with the appropriate VLAN id."

end quote

This 2009 dated old source here says ARP monitoring and Mii monitoring can not be done simultaneously. However, Red Hat seems to have made it possible with bonding version 2.6.2.

Kind Regards,
Remmele

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.