What does the message "bonding: bond0: link status definitely down for interface eth0, disabling it" mean?

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • Bonding of network links

Issue

  • /var/log/messages log reports these messages. What do they mean?
bonding: bond0: link status definitely down for interface eth0, disabling it

Resolution

  • This message means that bonding's link monitoring thinks the slave interface went down.

Root Cause

Bonding uses link monitoring to check that bonding slaves are up and working.

There is MII monitoring, where a check of the device state is made to ensure the link is up. This state is maintained by the slave interface driver in the device's net_device structure, and is checked by calling netif_carrier_ok(). MII stands for Media Independent Interface, the MII is the part of hardware which determines whether the link is up or not.

There is ARP monitoring, where an external system in the same IP broadcast domain is sent Gratuitous ARP traffic and the bond checks that replies are coming in the relevant interfaces. This state is maintained by timers inside the bonding driver, those timers are modified by the bonding driver generating and receiving ARP traffic.

Bonding requires a link monitoring mode to detect slave failures.

Diagnostic Steps

Ensure link monitoring mode is not checking too often. miimon should check every 100 ms or greater, and arp_ip_interval should check every 1000 ms or greater. Checking too often could result in false positives for link failure.

Check other logs around the event. Most NICs will log when the link goes down. If these logs are seen before the bonding driver message, then the hardware thinks that link actually did go down, and the cause of link failure is external to the operating system. To troubleshoot further, check and switch logs, reseat/replace cables, or see if a remote ARP target was not responding.

Ensure the system was not overloaded to the point where link monitoring kernel functions could not run, or system timers became inaccurate. This would require a high load average and most or all CPUs running uninterruptible kernel functions.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.