`br-ex` interface is gone after a node reboot in a router power outage

Solution Verified - Updated -

Issue

In a case there is a network outage like router power failure, before being aware of the root failure, a(or many times) node reboot may have been performed, and after the network recovering from the failure, the SNO cluster cannot recover by its own, the major issue is: br-ex interface is gone:

  Interface Status:
    br-int                     link=DOWN                           rx ring UNKNOWN    drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN
    eno1         0000:03:00.0  link=up 1000Mb/s full (autoneg=Y)   rx ring 256/4096   drv igb v4.18.0-305.72.1.rt7.144.el8_4.x / fw 3.25, 0x800005db, 1.2877.0
    ens1f0       0000:10:00.0  link=up 10000Mb/s full (autoneg=N)  rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens1f0.111                 link=up 10000Mb/s full (autoneg=N)  rx ring UNKNOWN    drv 802.1Q VLAN Support v1.8 / fw N/A
    ens1f0v0     0000:10:01.0  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v1     0000:10:01.1  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v2     0000:10:01.2  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v3     0000:10:01.3  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v4     0000:10:01.4  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v5     0000:10:01.5  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v6     0000:10:01.6  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f0v7     0000:10:01.7  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1       0000:10:00.1  link=up 10000Mb/s full (autoneg=N)  rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens1f1v0     0000:10:09.0  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1v1     0000:10:09.1  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1v2     0000:10:09.2  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1v3     0000:10:09.3  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1v4     0000:10:09.4  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f1v5     0000:10:09.5  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2       0000:10:00.2  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens1f2v0     0000:10:11.0  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2v1     0000:10:11.1  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2v2     0000:10:11.2  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2v3     0000:10:11.3  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2v4     0000:10:11.4  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f2v5     0000:10:11.5  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3       0000:10:00.3  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens1f3v0     0000:10:19.0  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3v1     0000:10:19.1  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3v2     0000:10:19.2  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3v3     0000:10:19.3  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3v4     0000:10:19.4  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens1f3v5     0000:10:19.5  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0       0000:12:00.0  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens2f0v2     0000:12:01.2  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0v3     0000:12:01.3  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0v4     0000:12:01.4  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0v5     0000:12:01.5  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0v6     0000:12:01.6  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f0v7     0000:12:01.7  link=DOWN                           rx ring 512/4096   drv iavf v4.18.0-305.72.1.rt7.144.el8_4.x / fw N/A
    ens2f1       0000:12:00.1  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens2f2       0000:12:00.2  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    ens2f3       0000:12:00.3  link=DOWN                           rx ring 2048/8160  drv ice v4.18.0-305.72.1.rt7.144.el8_4.x / fw 2.54 0x8000cf16 1.2960.0
    lo           PCI UNKNOWN   link=up                             rx ring UNKNOWN    drv UNKNOWN / fw UNKNOWN
    ovn-k8s-mp0                link=up                             rx ring UNKNOWN    drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN
    ovs-system                 link=DOWN                           rx ring UNKNOWN    drv openvswitch v4.18.0-305.72.1.rt7.144.el8_4.x / fw UNKNOWN

ovs-configuration systemd service failed:

Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.

Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=1/FAILURE
Jan 25 03:26:24 host.example.com rpc.statd[3632]: Version 2.3.3 starting
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: ens1f0.111:f70e08cd-1615-441d-90f7-f38d069baf96:vlan:1673842172:Mon Jan 16 04\:09\:32 2023:yes:1:no:/org/freedesktop/NetworkManager/Settings/2:no:::::/etc/NetworkManager/system-connections/ens1f0.111.nmconnection
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Failed with result 'exit-code'.
Jan 25 03:26:24 host.example.com rpc.statd[3632]: Flags: TI-RPC
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip -d address show
Jan 25 03:26:24 host.example.com mco-hostname[3599]: waiting for non-localhost hostname to be assigned
Jan 25 03:26:24 host.example.com mco-hostname[3599]: node identified as host.example.com
Jan 25 03:26:24 host.example.com systemd[1]: Failed to start Configures OVS with proper host networking configuration.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
Jan 25 03:26:24 host.example.come configure-ovs.sh[3180]:     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     inet 127.0.0.1/8 scope host lo
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:        valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     inet6 ::1/128 scope host
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:        valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9216 numtxqueues 8 numrxqueues 8 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     inet6 fe80::cd4f:525f:c5e8:87d1/64 scope link noprefixroute
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:        valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 3: ens1f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 4: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
Jan 25 03:26:24 host.example.come configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     inet6 fe80::6597:75ba:7e97:42cf/64 scope link noprefixroute
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:        valid_lft forever preferred_lft forever
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 5: ens1f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 6: ens1f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 7: ens2f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com systemd[1]: ovs-configuration.service: Consumed 770ms CPU time
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 8: ens2f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 9: ens2f2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: 10: ens2f3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 0 minmtu 68 maxmtu 9702 numtxqueues 64 numrxqueues 64 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     link/ether 00:00:00:00:00:bb brd 00:00:00:00:00:aa promiscuity 1 minmtu 68 maxmtu 65535
promiscuity 0 minmtu 0 maxmtu 65535
Jan 25 03:26:24 host.example.com systemd[1]: Starting Wait for a non-localhost hostname...
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]:     vlan protocol 802.1Q id 111 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Jan 25 03:26:24 host.example.com systemd[1]: Started Wait for a non-localhost hostname.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip route show
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + ip -6 route show
Jan 25 03:26:24 host.example.com systemd[1]: Reached target Network is Online.
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: ::1 dev lo proto kernel metric 256 pref medium
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: fe80::/64 dev eno1 proto kernel metric 100 pref medium
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: fe80::/64 dev ens1f1 proto kernel metric 101 pref medium
Jan 25 03:26:24 host.example.com bash[3606]: Trying to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b77987af95367456f94a6dc6eb2e8aa31c98411d5f08f91e88aced0bfebf5f91...
Jan 25 03:26:24 host.example.com bash[3606]: time="2023-01-25T03:26:24Z" level=warning msg="failed, retrying in 1s ... (1/3). Error: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b77987af95367456f94a6dc6eb2e8aa31c98411d5f08f91e88aced0bfebf5f91: error pinging docker registry quay.io: Get \"https://quay.io/v2/\": proxyconnect tcp: dial tcp 10.131.50.202:3128: connect: network is unreachable"
Jan 25 03:26:24 host.example.com systemd[1]: Starting Crash recovery kernel arming...
Jan 25 03:26:24 host.example.com configure-ovs.sh[3180]: + exit 1

And since ovs-configuration systemd service is a one-shot service, the network recovery won't trigger the failed service to rerun, so the broken ovs configuration won't be fixed automatically.

cat /etc/systemd/system/ovs-configuration.service;

[Unit]
Description=Configures OVS with proper host networking configuration
# Removal of this file signals firstboot completion
ConditionPathExists=!/etc/ignition-machine-config-encapsulated.json
# This service is used to move a physical NIC into OVS and reconfigure OVS to use the host IP
Requires=openvswitch.service
Wants=NetworkManager-wait-online.service
After=NetworkManager-wait-online.service openvswitch.service network.service
Before=network-online.target kubelet.service crio.service node-valid-hostname.service

[Service]
# Need oneshot to delay kubelet
Type=oneshot
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
StandardOutput=journal+console
StandardError=journal+console

[Install]
WantedBy=network-online.target

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.9
    • 4.10
    • 4.11
    • Single Node OpenShift (SNO)
  • OVN-Kubernetes

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content