Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

  • Comments
  • Serious Networking Weirdness

    Posted on

    Yesterday, I patched a RHEL 5 host up from 5.6 to 5.9. After doing so, my networking went wonky.

    System:
    * HP DL380G6
    * on-board BroadComm NeteXtreme II quad-port 1Gbps interface
    * add-on Mellanox ConnectX dual-port 10Gbps interface
    * Two bonded interfaces:
    * bond0: Asymmetrical 10Gbps/1Gbps Active/Passive pair (working just fine)
    * bond1: Asymmetrical 10Gbps/1Gbps Active/Passive pair (not quite working)

    I'm reasonably sure it's not strictly a driver issue, as both bonds have the same composition (one port off the Mellanox card as the primary link; one port off the BroadComm as the standby link) and one of the bonds works. I've used ifenslave on both bonds to force/change the active-link to verify all NICs' functionality (or lack thereof)

    When I attempt to ping out from bond1 to its local LAN segment, I get ICMP "host unreachable" errors. This happens for selected targets (its application-partner, a NAS and the segment's default gateway device). When I look at the ARP table entries associated with that interface, it shows the IPs for the pinged systems, but shows the associated MAC entries as "".

    I tried backing out bond1 back down to its constituent interfaces (specifically, the Mellenox 10Gbps interface at "eth1"). The ping and ARP results were the same.

    I finally installed

    tcpdump
    on both application partners. I fired it up on the problematic host (against eth1) and then pinged from the partner. inbound-ping, which had (also) previously been not working, started to respond. The afflicted host pinged just fine, right up until I stopped tcpdump against eth1. Weird.

    I decided that eth1 was probably (sorta) good, so I recomposed bond1 from the Mellanox (eth1) and the BroadComm (eth3) that it had been previously composed of. I then started ping on the partner system - getting the expected failures. I then started tcpdump on the bond and pings started working. I stopped tcpdump and the pings again failed. I changed my tcpdump to reference eth1 and pings started working again. I changed my tcpdump to reference eth3 and pings continued to fail. I used ifenslave to make eth3 the active link and started the tcpdump against eth3 - pings started working.

    Concurrent to the tcpdumps of bond1/eth1/eth3, I looked at my ARP tables. While tcpdump was active and the partner system was able to ping the wonky host, my ARP table entries looked normal. Within a couple seconds of turning tcpdump off, the ARP table entries would again change to "".

    I feel like I'm really close to figuring out what's wrong, but need a final push. If anyone here has any suggestions to get me over the hump, it'd be greatly appreciated.

    [EDIT]
    As a temporary workaround, I ended up turning off tcpdump and doing an

    ifconfig bond1 promisc
    . Obviously can't leave things this way as the security folks will have a fit if they ever scan the box.
    [/EDIT]

    Thanks in advance.

    by

    points

    Responses

    Red Hat LinkedIn YouTube Facebook X, formerly Twitter

    Quick Links

    Help

    Site Info

    Related Sites

    © 2026 Red Hat