rhel 8.7 port flapping

Latest response

I have two servers that keep flapping
I am on rhel 8.7, with a network bond.
I have tried with active-backup and also with 802.3ad, same issue.
Interfaces bonded are 1 and 3.

1 NIC goes to switch1 and the other to switch2
Tested moving everything into the same switch and no luck.

Tested new cables, port changed on switch, moved all to the same switch and still happening.

When it happens both NICs fail, so the bond is useless.

Is there a way I can put some debugging on to check if the issue is on the server or on the switch? or something to be able to tell what is happening?

Here is some of the logging in /var/log/messages (currently configured as 802.3ad

Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7376] audit: op="connection-update" uuid="ece85ef4-089d-4170-85a1-d329f0f2f290" name="bond0" args="connection.timestamp" pid=675351 uid=0 result="success"
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7549] agent-manager: agent[828347294f982870,:1.842749/nmcli-connect/0]: agent registered
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7555] device (bond0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 dbus-daemon[2013]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.4' (uid=0 pid=2040 comm="/usr/sbin/NetworkManager --no-daemon ")
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7562] manager: NetworkManager state is now CONNECTED_LOCAL
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7565] device (bond0): disconnecting for new activation request.
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7565] audit: op="connection-activate" uuid="ece85ef4-089d-4170-85a1-d329f0f2f290" name="bond0" pid=675355 uid=0 result="success"
Nov  1 01:25:36 server1 systemd[1]: Starting Network Manager Script Dispatcher Service...
Nov  1 01:25:36 server1 dbus-daemon[2013]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Nov  1 01:25:36 server1 systemd[1]: Started Network Manager Script Dispatcher Service.
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.7738] device (bond0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 kernel: bond0: (slave eno1): Releasing backup interface
Nov  1 01:25:36 server1 kernel: bond0: (slave eno1): the permanent HWaddr of slave - MAC_Address - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
Nov  1 01:25:36 server1 kernel: bond0: active interface up!
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.8272] device (bond0): detached bond port eno1
Nov  1 01:25:36 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
Nov  1 01:25:36 server1 kernel: bond0: (slave eno3): Removing an active aggregator
Nov  1 01:25:36 server1 kernel: bond0: (slave eno3): Releasing backup interface
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9173] device (bond0): detached bond port eno3
Nov  1 01:25:36 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9708] device (bond0): Activation: starting connection 'bond0' (ece85ef4-089d-4170-85a1-d329f0f2f290)
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9728] device (eno1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9735] device (eno3): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9742] device (bond0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9744] manager: NetworkManager state is now CONNECTING
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9748] device (bond0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9752] device (bond0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9756] policy: set 'bond0' (bond0) as default for IPv4 routing and DNS
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9768] device (eno1): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9776] device (eno3): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9799] policy: auto-activating connection 'eno1' (1140f3e6-8308-4d92-8a8f-0935e9535ba3)
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9801] policy: auto-activating connection 'eno3' (7b645071-92f6-4c95-ba8e-7c39316d283d)
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9803] device (eno1): Activation: starting connection 'eno1' (1140f3e6-8308-4d92-8a8f-0935e9535ba3)
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9805] device (eno3): Activation: starting connection 'eno3' (7b645071-92f6-4c95-ba8e-7c39316d283d)
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9806] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9808] device (eno1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9814] device (eno3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9818] device (eno3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:36 server1 NetworkManager[2040]: <info>  [1698801936.9823] device (eno1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: bond0: (slave eno1): Enslaving as a backup interface with a down link
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.0792] device (bond0): attached bond port eno1
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.0792] device (eno1): Activation: connection 'eno1' enslaved, continuing activation
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.0796] device (eno3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: bond0: (slave eno3): Enslaving as a backup interface with a down link
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1622] device (bond0): attached bond port eno3
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1622] device (eno3): Activation: connection 'eno3' enslaved, continuing activation
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1805] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1810] device (eno3): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1833] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1834] device (eno3): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1836] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1840] device (eno1): Activation: successful, device activated.
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1842] device (eno3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.1846] device (eno3): Activation: successful, device activated.
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2252] audit: op="connection-update" uuid="ece85ef4-089d-4170-85a1-d329f0f2f290" name="bond0" args="connection.timestamp" pid=675429 uid=0 result="success"
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2441] agent-manager: agent[aa97025a06f56f5a,:1.842758/nmcli-connect/0]: agent registered
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2447] device (bond0): state change: ip-config -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2449] manager: NetworkManager state is now CONNECTED_LOCAL
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2456] device (bond0): disconnecting for new activation request.
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2457] audit: op="connection-activate" uuid="ece85ef4-089d-4170-85a1-d329f0f2f290" name="bond0" pid=675448 uid=0 result="success"
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2474] device (bond0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: bond0: (slave eno1): Releasing backup interface
Nov  1 01:25:37 server1 kernel: bond0: (slave eno1): the permanent HWaddr of slave - MAC_Address - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.2983] device (bond0): detached bond port eno1
Nov  1 01:25:37 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
Nov  1 01:25:37 server1 kernel: bond0: (slave eno3): Releasing backup interface
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.3880] device (bond0): detached bond port eno3
Nov  1 01:25:37 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4408] device (bond0): Activation: starting connection 'bond0' (ece85ef4-089d-4170-85a1-d329f0f2f290)
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4424] device (eno1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4430] device (eno3): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4438] device (bond0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4440] manager: NetworkManager state is now CONNECTING
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4446] device (bond0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4451] device (bond0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4454] device (eno1): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4462] device (eno3): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4475] policy: set 'bond0' (bond0) as default for IPv4 routing and DNS
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4492] policy: auto-activating connection 'eno1' (1140f3e6-8308-4d92-8a8f-0935e9535ba3)
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4494] policy: auto-activating connection 'eno3' (7b645071-92f6-4c95-ba8e-7c39316d283d)
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4500] device (eno1): Activation: starting connection 'eno1' (1140f3e6-8308-4d92-8a8f-0935e9535ba3)
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4502] device (eno3): Activation: starting connection 'eno3' (7b645071-92f6-4c95-ba8e-7c39316d283d)
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4503] device (eno1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4508] device (eno1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4513] device (eno3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4516] device (eno3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.4522] device (eno1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: bond0: (slave eno1): Enslaving as a backup interface with a down link
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.5321] device (bond0): attached bond port eno1
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.5322] device (eno1): Activation: connection 'eno1' enslaved, continuing activation
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.5325] device (eno3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 kernel: bond0: (slave eno3): Enslaving as a backup interface with a down link
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6143] device (bond0): attached bond port eno3
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6143] device (eno3): Activation: connection 'eno3' enslaved, continuing activation
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6335] device (eno1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6341] device (eno3): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6358] device (eno1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6359] device (eno3): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6361] device (eno1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6366] device (eno1): Activation: successful, device activated.
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6369] device (eno3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:37 server1 NetworkManager[2040]: <info>  [1698801937.6373] device (eno3): Activation: successful, device activated.
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.1810] device (eno1): carrier: link connected
Nov  1 01:25:39 server1 kernel: igb 0000:01:00.0 eno1: igb: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.2469] device (eno3): carrier: link connected
Nov  1 01:25:39 server1 kernel: igb 0000:01:00.2 eno3: igb: eno3 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4493] device (bond0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4509] device (bond0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4511] device (bond0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4514] manager: NetworkManager state is now CONNECTED_SITE
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4516] device (bond0): Activation: successful, device activated.
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4520] manager: NetworkManager state is now CONNECTED_GLOBAL
Nov  1 01:25:39 server1 NetworkManager[2040]: <info>  [1698801939.4865] device (bond0): carrier: link connected
Nov  1 01:25:39 server1 kernel: bond0: (slave eno1): link status definitely up, 100 Mbps full duplex
Nov  1 01:25:39 server1 kernel: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
Nov  1 01:25:39 server1 kernel: bond0: active interface up!
Nov  1 01:25:39 server1 kernel: bond0: (slave eno3): link status definitely up, 100 Mbps full duplex

Thanks

Responses