`br-ex` interface is gone after a node reboot in a router power outage
Issue
In a case there is a network outage like router power failure, before being aware of the root failure, a(or many times) node reboot may have been performed, and after the network recovering from the failure, the SNO cluster cannot recover by its own, the major issue is: br-ex
interface is gone:
Interface Status:
br-int link=DOWN rx ring UNKNOWN drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN
eno1 0000:03:00.0 link=up 1000Mb/s full (autoneg=Y) rx ring 256/4096 drv igb v4.18.0-305.72.1.rt7.144.el8_4.x / fw 3.25, 0x800005db, 1.2877.0
ens1f0 0000:10:00.0 link=up 10000Mb/s full (autoneg=N) rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens1f0.111 link=up 10000Mb/s full (autoneg=N) rx ring UNKNOWN drv 802.1Q VLAN Support v1.8 / fw N/A
ens1f0v0 0000:10:01.0 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v1 0000:10:01.1 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v2 0000:10:01.2 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v3 0000:10:01.3 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v4 0000:10:01.4 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v5 0000:10:01.5 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v6 0000:10:01.6 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f0v7 0000:10:01.7 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1 0000:10:00.1 link=up 10000Mb/s full (autoneg=N) rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens1f1v0 0000:10:09.0 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1v1 0000:10:09.1 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1v2 0000:10:09.2 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1v3 0000:10:09.3 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1v4 0000:10:09.4 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f1v5 0000:10:09.5 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2 0000:10:00.2 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens1f2v0 0000:10:11.0 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2v1 0000:10:11.1 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2v2 0000:10:11.2 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2v3 0000:10:11.3 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2v4 0000:10:11.4 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f2v5 0000:10:11.5 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3 0000:10:00.3 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens1f3v0 0000:10:19.0 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3v1 0000:10:19.1 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3v2 0000:10:19.2 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3v3 0000:10:19.3 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3v4 0000:10:19.4 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens1f3v5 0000:10:19.5 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0 0000:12:00.0 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens2f0v2 0000:12:01.2 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0v3 0000:12:01.3 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0v4 0000:12:01.4 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0v5 0000:12:01.5 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0v6 0000:12:01.6 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f0v7 0000:12:01.7 link=DOWN rx ring 512/4096 drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
ens2f1 0000:12:00.1 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens2f2 0000:12:00.2 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
ens2f3 0000:12:00.3 link=DOWN rx ring 2048/8160 drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
lo PCI UNKNOWN link=up rx ring UNKNOWN drv UNKNOWN / fw UNKNOWN
ovn-k8s-mp0 link=up rx ring UNKNOWN drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN
ovs-system link=DOWN rx ring UNKNOWN drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN
ovs-configuration systemd service failed:
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 03:26:24 host.example.com rpc.statd[3632]: Version 2.3.3 starting
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: ens1f0.111:f70e08cd-1615-441d-90f7-f38d069baf96:vlan:1673842172:Mon Jan 16 04\:09\:32 2023:yes:1:no:/org/freedesktop/NetworkManager/Settings/2:no:::::/etc/NetworkManager/system-connections/ens1f0.111.nmconnection
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Jan 25 03:26:24 host.example.com rpc.statd[3632]: Flags: TI-RPC
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip -d address show
Jan 25 03:26:24 host.example.com mco-hostname[3599]: waiting for non-localhost hostname to be assigned
Jan 25 03:26:24 host.example.com mco-hostname[3599]: node identified as host.example.com
Jan 25 03:26:24 host.example.com systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
Jan 25 03:26:24 host.example.come configure-ovs.sh[3180]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: inet 127.0.0.1/8 scope host lo
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: inet6 ::1/128 scope host
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9216 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: inet6 fe80::cd4f:525f:c5e8:87d1/64 scope link noprefixroute
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 3: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 4: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
Jan 25 03:26:24 host.example.come configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: inet6 fe80::6597:75ba:7e97:42cf/64 scope link noprefixroute
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 5: ens1f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 6: ens1f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 7: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Consumed 770ms CPU time
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 8: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 9: ens2f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 10: ens2f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 1 minmtu 68 maxmtu 65535
promiscuity 0 minmtu 0 maxmtu 65535
Jan 25 03:26:24 host.example.com systemd[1]: Starting Wait for a non-localhost hostname...
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: vlan protocol 802.1Q id 111 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com systemd[1]: Started Wait for a non-localhost hostname.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip route show
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip -6 route show
Jan 25 03:26:24 host.example.com systemd[1]: Reached target Network is Online.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: ::1 dev lo proto kernel metric 256 pref medium
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: fe80::/64 dev eno1 proto kernel metric 100 pref medium
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: fe80::/64 dev ens1f1 proto kernel metric 101 pref medium
Jan 25 03:26:24 host.example.com bash[3606]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b77987af95367456f94a6dc6eb2e8aa31c98411d5f08f91e88aced0bfebf5f91...
Jan 25 03:26:24 host.example.com bash[3606]: time="2023-01-25T03:26:24Z" level=warning msg="failed, retrying in 1s ... (1/3). Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b77987af95367456f94a6dc6eb2e8aa31c98411d5f08f91e88aced0bfebf5f91: error pinging docker registry quay.io: Get \"https://quay.io/v2/\": proxyconnect tcp: dial tcp 10.131.50.202:3128: connect: network is unreachable"
Jan 25 03:26:24 host.example.com systemd[1]: Starting Crash recovery kernel arming...
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + exit 1
And since ovs-configuration systemd service is a one-shot service, the network recovery won't trigger the failed service to rerun, so the broken ovs configuration won't be fixed automatically.
cat /etc/systemd/system/ovs-configuration.service;
[Unit]
Description=Configures OVS with proper host networking configuration
# Removal of this file signals firstboot completion
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
# This service is used to move a physical NIC into OVS and reconfigure OVS to use the host IP
Requires=openvswitch.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service openvswitch.service network.service
Before=network-online.target kubelet.service crio.service node-valid-hostname.service
[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
StandardOutput=journal+console
StandardError=journal+console
[Install]
WantedBy=network-online.target
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.9
- 4.10
- 4.11
- Single Node OpenShift (SNO)
- OVN-Kubernetes
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.