Bonded interface is unreachable after reboot but shortly after can be reached
Environment
-
Red Hat Enterprise Linux 8
- Specifically with NetworkManager versioned below
NetworkManager-1.30.0-7.el8
- Specifically with NetworkManager versioned below
-
Network interfaces in an active-backup bond configuration
Issue
- Network bond sending incorrect MAC in gARP on boot
- When the system boots, it is sending out a bad MAC address in the gARP. The correct MAC is shown in
ip link
, and when checking the bond status, so this appears to happen sometime during boot. - The bond first comes up with random MAC and assigns that MAC to the first sub-interface rather than using the MAC of the first sub-interface.
Resolution
- Update NetworkManager to at least version
NetworkManager-1.30.0-7.el8
or above as per errata RHSA-2021:1574
Workaround
-
Set
cloned-mac-address
on the bond interface;nmcli connection modify "bond0" "802-3-ethernet.cloned-mac-address" <MAC ADDR> dracut -f <--- required for the change to be applied on boot as well.
-
Example of setting
cloned-mac-address
on a bond within a VLAN;nmcli connection modify "bond0" "802-3-ethernet.cloned-mac-address" 11:22:33:44:55:66 nmcli connection modify "bond0.1234" "802-3-ethernet.cloned-mac-address" 11:22:33:44:55:66 <--- applies to VLAN interface dracut -f
-
Root Cause
NetworkManager attempted to restore a prior MAC address when a bond configuration changed and ended up assigning the random MAC assigned on creation to its sub-interfaces. To elaborate;
- On bond creation, the empty bond is assigned a random MAC address by the kernel and NetworkManager remembers this.
- The default behavior of bonds when a sub-interface is added to the bond is for the bond to use the sub-interface's MAC address as its own MAC address.
- When
fail_over_mac
is used with the bond configuration, this changes the default MAC address assignment behavior. Withfail_over_mac
being set (E.G.fail_over_mac=1
orfail_over_mac=follow
) on an active-backup bond, the kernel will update any network link connectivity changes (IE is the bond connected to the network or not) but does not update the MAC address. - NetworkManager recognized the random MAC address assigned on bond creation as the legitimate MAC address. When a sub-interface was added to the bond, NetworkManager assigned the random MAC address to the sub-interface as well.
- The expected behavior is for the bond to pick up the first active sub-interface's MAC address. NetworkManager watches for the kernel to update the MAC address, but the kernel does not under the aforementioned scenario, so the bond and its sub-interfaces are not assigned the correct MAC addresses and are instead assigned the random address provided to the initially empty bond.
Diagnostic Steps
Steps to reproduce
- On a system (physical or virtual) with two or more network adapters, create an active-backup bond interface and add two or more of the network adapters as sub-interfaces to the bond.
-
Review the MAC addresses of the devices. Below is an example;
# ip link [...] 2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 11:11:11:11:11:11 brd ff:ff:ff:ff:ff:ff permaddr 55:55:55:55:55:55 3: bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 11:11:11:11:11:11 brd ff:ff:ff:ff:ff:ff 4: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 22:22:22:22:22:22 brd ff:ff:ff:ff:ff:ff
-
Reboot and review the MAC addresses
# ip link [...] 2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 55:55:55:55:55:55 brd ff:ff:ff:ff:ff:ff 3: bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 55:55:55:55:55:55 brd ff:ff:ff:ff:ff:ff 4: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 22:22:22:22:22:22 brd ff:ff:ff:ff:ff:ff
-
The MAC address should retain the MAC address in further reboots.
# ip link [...] 2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 55:55:55:55:55:55 brd ff:ff:ff:ff:ff:ff 3: bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 55:55:55:55:55:55 brd ff:ff:ff:ff:ff:ff 4: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond state UP group default qlen 1000 link/ether 22:22:22:22:22:22 brd ff:ff:ff:ff:ff:ff
What to look for
For example purposes, the desired bond MAC address is 55:55:55:55:55:55
for eth0
and 55:55:55:55:55:56
for eth1
-
Review
dmesg
, rsyslog logs (/var/log/messages
), or system journal logs (journalctl -k
) for instances of an incorrect MAC address being assigned to the bond on boot;$ grep -ir 'set new mac address' -B 3 var/log/messages Sep 1 21:09:01 hostname kernel: device eth0 left promiscuous mode Sep 1 21:09:01 hostname kernel: bond0: (slave eth0): making interface the new active one Sep 1 21:09:01 hostname kernel: device eth1 entered promiscuous mode Sep 1 21:09:01 hostname kernel: i40e 0000:19:00.0 eth0: set new mac address 55:55:55:55:55:55 -- Sep 1 21:38:24 hostname kernel: 8021q: adding VLAN 0 to HW filter on device eth0 Sep 1 21:38:24 hostname kernel: bond0: (slave eth0): making interface the new active one <--- Sep 1 21:38:24 hostname kernel: device eth0 entered promiscuous mode Sep 1 21:38:24 hostname kernel: i40e 0000:19:00.0 eth0: set new mac address de:ad:00:00:be:ef <--- -- Sep 1 21:40:27 hostname kernel: device eth0 left promiscuous mode Sep 1 21:40:27 hostname kernel: bond0: (slave eth1): making interface the new active one <--- Sep 1 21:40:27 hostname kernel: device eth1 entered promiscuous mode Sep 1 21:40:27 hostname kernel: i40e 0000:19:00.1 eth1: set new mac address ab:12:ab:12:ab:12 <--- -- Sep 1 21:40:30 hostname kernel: 8021q: adding VLAN 0 to HW filter on device eth0 Sep 1 21:40:30 hostname kernel: bond0: (slave eth0): making interface the new active one <--- Sep 1 21:40:30 hostname kernel: device eth0 entered promiscuous mode Sep 1 21:40:30 hostname kernel: i40e 0000:19:00.0 eth0: set new mac address so:me:ra:nd:om <--- -- Sep 2 01:20:53 hostname kernel: bond0: (slave eth1): making interface the new active one <--- Sep 2 01:20:53 hostname kernel: device eth0 left promiscuous mode Sep 2 01:20:53 hostname kernel: device eth1 entered promiscuous mode Sep 2 01:20:53 hostname kernel: i40e 0000:19:00.1 eth1: set new mac address 12:34:56:78:90:10 <--- Sep 2 01:20:53 hostname kernel: i40e 0000:19:00.0 eth0: set new mac address 55:55:55:55:55:55 -- Sep 2 01:25:51 hostname kernel: 8021q: adding VLAN 0 to HW filter on device eth1 Sep 2 01:25:51 hostname kernel: bond0: (slave eth1): making interface the new active one Sep 2 01:25:51 hostname kernel: device eth1 entered promiscuous mode Sep 2 01:25:51 hostname kernel: i40e 0000:19:00.1 eth1: set new mac address 55:55:55:55:55:56
- In the above output, the first active sub-interface comes online and joins the bond but is assigned assigned the random MAC address upon joining
-
Review packet captures of the system as it comes online from your preferred package capture and analysis tool (
tcpdump
, Cisco EPC, etc) and check the MAC addresses of the bond interface within the Gratuitous ARP (gARP) sent from the problem system on boot. If the sending MAC address in the gARP packet does not match the origin MAC address as determined by the packet analysis tool, then this issue may be occurring. - Review the ARP table in the switch. If the MAC address of the bond interface does not match the MAC address in the ARP table while the problem system is unreachable, the issue may be occurring.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments