Red Hat Customer Portal - Access to 24x7 support and knowledge

Yesterday, I patched a RHEL 5 host up from 5.6 to 5.9. After doing so, my networking went wonky.

System:
* HP DL380G6
* on-board BroadComm NeteXtreme II quad-port 1Gbps interface
* add-on Mellanox ConnectX dual-port 10Gbps interface
* Two bonded interfaces:
* bond0: Asymmetrical 10Gbps/1Gbps Active/Passive pair (working just fine)
* bond1: Asymmetrical 10Gbps/1Gbps Active/Passive pair (not quite working)

I'm reasonably sure it's not strictly a driver issue, as both bonds have the same composition (one port off the Mellanox card as the primary link; one port off the BroadComm as the standby link) and one of the bonds works. I've used ifenslave on both bonds to force/change the active-link to verify all NICs' functionality (or lack thereof)

When I attempt to ping out from bond1 to its local LAN segment, I get ICMP "host unreachable" errors. This happens for selected targets (its application-partner, a NAS and the segment's default gateway device). When I look at the ARP table entries associated with that interface, it shows the IPs for the pinged systems, but shows the associated MAC entries as "".

I tried backing out bond1 back down to its constituent interfaces (specifically, the Mellenox 10Gbps interface at "eth1"). The ping and ARP results were the same.

I finally installed

tcpdump

on both application partners. I fired it up on the problematic host (against eth1) and then pinged from the partner. inbound-ping, which had (also) previously been not working, started to respond. The afflicted host pinged just fine, right up until I stopped tcpdump against eth1. Weird.

I decided that eth1 was probably (sorta) good, so I recomposed bond1 from the Mellanox (eth1) and the BroadComm (eth3) that it had been previously composed of. I then started ping on the partner system - getting the expected failures. I then started tcpdump on the bond and pings started working. I stopped tcpdump and the pings again failed. I changed my tcpdump to reference eth1 and pings started working again. I changed my tcpdump to reference eth3 and pings continued to fail. I used ifenslave to make eth3 the active link and started the tcpdump against eth3 - pings started working.

Concurrent to the tcpdumps of bond1/eth1/eth3, I looked at my ARP tables. While tcpdump was active and the partner system was able to ping the wonky host, my ARP table entries looked normal. Within a couple seconds of turning tcpdump off, the ARP table entries would again change to "".

I feel like I'm really close to figuring out what's wrong, but need a final push. If anyone here has any suggestions to get me over the hump, it'd be greatly appreciated.

[EDIT]
As a temporary workaround, I ended up turning off tcpdump and doing an

ifconfig bond1 promisc

. Obviously can't leave things this way as the security folks will have a fit if they ever scan the box.
[/EDIT]

Thanks in advance.

Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

Serious Networking Weirdness

Responses

Quick Links

Help

Site Info

Related Sites

Red Hat legal and privacy links

Red Hat legal and privacy links