How to test nic bonding?
I believe that I just setup network bonding on two interfaces. I want to test it to make sure that it will failover. Is there a command that I can use to fail one interface over to the other?
Thank you.
Daryl
Responses
Just "ifdown <bonded_interface_with_active_status>" and check system messages or just disable port on switch/unplug the cable - it will be more real test :-)
We've always found that using 'ifdown' to simulate a network failure is not a good practice. The best option obviously is to physically remove it or have the network team disable the port. If that's not possible, use "ifenslave" to detach an interface from a bond. For example, if bond0=eth0,eth1 and eth0 is active, use "ifenslave -d bond0 eth0".
... well, some "smart alec" stuff ( since the tool seems not to be too widely known and I was also surprised when I first saw it) : You can use "ethtool" ... It will at least reliably show you the status of your interface after any other modifications
You must have installed the "ethtool" package ...
All of eu already gave the solution.
You can run this command
# cat /proc/net/bonding/bond0
# ifdown eth0 / eth1
while eu down particular NIC card you can check the status in real time.
# watch cat /proc/net/bonding/bond0
or else
you can physically unplugged the cable from one NIC for testing purpose.
//shyfur
Additionaly to the previous answers i find ifenslave very handy
ifenslave -c|--change-active <master-if> <slave-if>
I agree with Duane that ifdown is not a sufficient test.
Think of what you are trying to achieve with bonding. You want:
- resiliency against electrical failure (eg: NIC fault, SPF fault/pull, cable fault/cut/pull)
- resiliency against logical failure (eg: someone logs onto the switch and puts an access-list on your switchport)
Ideally you should do an actual test for both of these. One with a physical cable pull, and one with some sort of logical interruption like an ACL or VLAN change.
ifenslave to remove and add an interface is more a test of the bonding driver's slave functionality than its failover, but at least that will test that the bonded MAC fails over to the other interfaces (if the bonding mode does that) and traffic continues to flow during a fail scenario.
Note that remotely shutting the switchport doesn't always result in the NIC/driver considering the port to be "down". Some network interfaces seem to require a physical cable pull for miimon to consider the interface as failed.
Have a look http://www.kernel.org/doc/Documentation/networking/bonding.txt under "7. Link Monitoring".
Using ARP monitoring might solve the problem concerning the NIC not failing/reporting down when shutting down a switch port. I had the same problem with some blades in an old enclosure that kept showing link to presented NIC's even thus it lost both uplinks.
Testing, I would as mentioned unplug the cable and/or down the interface. I would not use ifdown but (ifconfig eth0 down/ifconfig eth0 up). Ifdown is a script that nicely down's the interface -> and it's not what i want, is it!!.
Well spotted :) The ARP monitor is a good option for blades, as some models have backplane connectivity and there's no "single cable" to pull for a given interface. It's also a good way to confirm your network can pass "actual traffic", as opposed to just link connectivity.
A few things to keep in mind for arp_mon:
Check link status much less than you would with miimon. A monitoring interval of 100ms is not unreasonable for miimon, as all that's involved is checking something in the driver on the system. The whole system call to check connectivity via ethtool is over less than 1ms.
For arp_monitor we can easily flood the network with ARP requests with a short interval. Enough systems all using the one ARP target could even overwhelm the target. It's better to set the monitoring interval to at least 1000ms (ie: 1 second) and perhaps as high as 10 seconds, depending on the network.
It is better to set multiple ARP targets, so that your systems don't all think their bonds have failed just because the router or switch (or other RHEL system you're using as an ARP target) has a scheduled maintenance reboot.
Of course, ensure your ARP target is in the same broadcast domain as the server itself. ARP is a layer 2 protocol and cannot route.
Setting an ARP target outside the local LAN will result in no incoming ARP reply, which may result in the arp_monitor thinking the interface is down just because no other hosts talk to it for the arp_interval (depending on RHEL version, this behaviour is different between RHEL4 and RHEL5).
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
