8021q + bond (LACP etherchannel)

Latest response

I am trying to configure LACP Etherchannel Trunk (8021q) from RHEL server to Cumulus Linux. I can get it up and running and stable but the initial setup is very frustrating. It does not seem consistent or automatable to be able to bring up a bonded trunk.

Lets say you have Cumulus Linux switch
CL SWP1-> eth1 RHEL
CL SWP2->eth2 RHEL

On the RHEL side->
/etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth1
NAME=bond0-slave
TYPE=Ethernet
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes

For the bond
/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
IPADDR=1.1.1.1
PREFIX=24
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=4 miimon=100 lacp_rate=1 xmit_hash_policy=layer3+4"

For this example I will just do one VLAN, but you can have obviously thousands of these
/etc/sysconfig/network-scripts/ifcfg-bond0.400

DEVICE=bond0.400
IPADDR=10.11.0.100
PREFIX=24
ONBOOT=yes
BOOTPROTO=none
VLAN=yes

So with these flat files we do this->
modprobe bonding
modprobe 8021q
ifup ifcfg-eth1

This is where the problem is->
Error: Connection activation failed: Master device bond0 unmanaged or not available for activation

So even if I try to up the bond0 first (and this is with vagrant so I can destroy and start from scratch)
Error: Connection activation failed: Master device bond0 unmanaged or not available for activation

So its like a chicken and egg thing? I have found that I have to 'jiggle' the bond. I literally can't figure out the magic combination to get it up. Once its up it will work great and stay up unless i do a vagrant destroy on the RHEL VM.

One time to get it working I did a
ifdown bond0
ip link del bond0
ifdown eth1
ifup eth1

Then it came online magically. But this will not happen every time. Once its up I can do a nmcli conn reload and its stable... but this initial setup seems very sketchy/flaky and makes using Ansible impossible since I basically have to ignore_errors: yes on everything :(

What should I do here? What additional information do you need? The bond on the Cumulus Linux is not the problem (bonds to Cisco, Arista, Debian, Ubuntu, etc) so this is definitely a RHEL/CentOS thing.

Responses

Your config files look good, and you shouldn't have to modprobe anything.

If you're using the initscripts, then just ifup bond0 or service network restart should do it.

If you're using NetworkManager, I think you could just nmcli con up bond0 or maybe one of the slaves. Personally I would use nmtui.

So even if I try to up the bond0 first (and this is with vagrant so I can destroy and start from scratch to make sure its vanilla every time) Error: Connection activation failed: Master device bond0 unmanaged or not available for activation

I will try the other thing but I am pretty sure service network restart is the same as systemctl restart network which also threw an error (which of course I forgot to save off... ) Let me get the other outputs.

I was talking to a co-worker running Debian Jessie and he said he hit a similar issue and suggested that I don't use bond0. I switched my config so the bond was named sean (literally my name) and it worked first try...

(rips hair out)

[vagrant@server01 ~]$ nmcli connection show
NAME                UUID                                  TYPE            DEVICE
Wired connection 1  a256851d-0332-48a1-bea4-f65356f9ab50  802-3-ethernet  --
System vagrant      586a4c92-991e-389e-cb8a-feecc2e69af1  802-3-ethernet  vagrant
System eth0         5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03  802-3-ethernet  eth0
sean                ad33d8b0-1f7b-cab9-9447-ba07f855b143  bond            sean
Vlan bond0.400      e34270cb-0d34-f04d-9ee6-a6ac582b220b  vlan            sean.400
sean-slave          9c92fad9-6ecb-3e6c-eb4d-8a47c6f50c04  802-3-ethernet  eth1
Wired connection 2  5f471392-c3cc-4655-b3bf-36dd511c2c24  802-3-ethernet  --

I noticed there is still a bond0 in there... so it indeed looks like its reserved for some reason? AHHHHH

more evidence

[root@server02 ~]# ls /proc/net/bonding/
bond0  sean

This is the most annoying bug I have encountered on Linux networking ever. Hope this saves someone some time on systemd networking

You should be able to name a bond bond0 with NetworkManager, I have RHEL7 systems setup that way.

It's possible that modprobing the bonding module first has created an interface called bond0 and NM is running into some problems naming a new connection (the NM abstraction) over an existing device.

It's not too convenient, but I think with your current config files you could probably just reboot and have everything come up as bond0.

@Jamie

So you mentioned before that you don't need to modprobe (e.g. modprobe --first-time bonding) but the RHEL official docs are very specific here-> "In Red Hat Enterprise Linux 7, the bonding module is not loaded by default. You can load the module by issuing the following command as root"

Which is what I did. Rebooting is probably fine but AFAIK its not possible to automate with something like nmtui (I need to be able to build dozens and dozens of servers with hundreds of VLANs). The 'CLI' method is the method I am trying to use. I guess it is also possible to use nmcli. What is the recommendation here from RHEL?

Hello, the next paragraph has a sentence "Note that given a correct configuration file using the BONDING_OPTS directive, the bonding module will be loaded as required and therefore does not need to be loaded separately." Maybe we should ask for that to be moved to the first paragraph.

Especially if the example shows the BONDING_OPTS in there. So even if I don't modprobe bonding the directions also say to do this for 8021q. Is that not the case either? It seems like when I remove modprobe it works fine.

However the bond0 is still there regardless if I modprobe or not. This is on 7.2... I am going to add more output below:

Ah, that's in Section 4.4 about using commandline. Section 4.3 about using nmcli doesn't list such a requirement.

From an updated RHEL 7 system with no bonding module loaded, following Section 4.3, I was able to get a bond named bond0 up and working fine:

# nmcli con add type bond con-name bond0 ifname bond0 mode active-backup
# nmcli con add type bond-slave ifname eth1 master bond0
# nmcli con up bond-slave-eth1
# nmcli con up bond0

These steps create the ifcfg- files for the bond and slaves too.

My recommendation would be to use nmtui if possible, nmcli if not.

The previous author of the RHEL 7 Networking Guide definitely did test everything in there, though it's possible the behaviour of NM has changed with the commandline steps since RHEL 7.0.

If you can supply a reproducible set of steps showing where the doc is wrong or doesn't behave as desired, please do log a bug against the doc-Networking_Guide component and the current author(s) can review the steps.

It does not look like nmcli can set the hash policy which makes the nmcli useless to me as I need Layer3+4. Maybe I am just missing it and don't see what I need to see due to my newbness on RHEL. I see lacp-rate, miimon and 802.3ad but not hash.

[vagrant@server01 ~]$ nmcli con add type bond con-name bond0 ifname bond0 mode 802.3ad lacp-rate 1
arp-interval   arp-ip-target  downdelay      gw4            gw6            ip4            ip6            miimon         primary        updelay

So following your steps (please realize since my original post I renamed interface to uplink1/uplink2 because of an issue with systemd where it does not want you to use eth1/eth2 if you are renaming interfaces)

[vagrant@server01 ~]$ sudo nmcli con add type bond con-name bond0 ifname bond0 mode 802.3ad lacp-rate 1
Connection 'bond0' (9bcf0352-046a-4a6d-8281-0ebf1ecb195f) successfully added.
[vagrant@server01 ~]$ sudo nmcli con add type bond-slave ifname uplink1 master bond0
Connection 'bond-slave-uplink1' (f3e6fc8e-56c4-45d6-be81-4302ea0fae58) successfully added.
[vagrant@server01 ~]$ sudo nmcli con up bond-slave-uplink1
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/12)
[vagrant@server01 ~]$ sudo nmcli con up bond0
Connection successfully activated (master waiting for slaves) (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/13)

connection came up! (looks good except for hash) although I need to add VLANs to it now... not sure how to do that with nmcli need to read more. If this was to be automated I would literally need to have a command/shell per command which does not seem very Ansible friendly.

If nmcli is just creating flat files I would prefer to just do that (modify the flat files directly) since that seems to be the normal way of modifying Linux for DevOps tools. I am not sure why RHEL/CentOS uses the NM tool, it seems to create more overhead then it is solving. nmcli seems to have a module in Ansible but it is also missing the hash policy. I honestly need to read more about nmcli, I am not sure why this is preferred. Understand my background is Debian/Ubuntu and obviously Cumulus Linux where we use ifupdown and ifupdown2.

So what I ended up doing: I took your earlier advice and just did a reboot after flat file configuration and this work flawlessly and does not require any 'weird' quirks. I simply configure the flat files, reboot, wait for SSH to come back, then check routing to make sure my default route is setup in-band correctly. I think the bond0 is only 'reused' when the CLI method is used apparently shrug

I wonder if it would be possible to 'port' ifupdown2 from Cumulus Linux and install it directly onto RHEL/CentOS. That would be really cool to see. We have a feature called ifreload which takes care of all these problems and works really well with automation (e.g. Ansible).

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.