OpenShift 4: pods that are HostNetworked on nodes using routingViaHost:true when using a secondary interface for br-ex cannot route to default kubernetes service IP
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4.14
- Nodes are deployed with a secondary Interface, and the file
/etc/default/nodeip-configurationhas been updated to point at the desired secondary interface to be selected bybr-exduring boot withNODEIP_HINT=<secondary-interface> - Secondary interface has been assigned a default route rule in the
/etc/NetworkManager/system-connections/*.nmconnectionnetwork definition file, or/etc/sysconfig/network-scripts/*.ifcfgfiles. - Nodes are deployed with HostNetwork=True, and IPForwarding=Global
$ oc get network.operator.openshift.io cluster -o jsonpath='{.spec.defaultNetwork}' | jq
{
"ovnKubernetesConfig": {
"egressIPConfig": {},
"gatewayConfig": {
"ipForwarding": "Global",
"routingViaHost": true
},
- (see notes on IpForwarding parameter):
properties:
ipForwarding:
description: IPForwarding controls IP forwarding for all traffic on OVN-Kubernetes managed interfaces (such as br-ex). By default this is set to Restricted, and Kubernetes related traffic is still forwarded appropriately, but other IP traffic will not be routed by the OCP node. If there is a desire to allow the host to forward traffic across OVN-Kubernetes managed interfaces, then set this field to "Global". The supported values are "Restricted" and "Global".
type: string
Issue
- Steps have been taken to deploy nodes in a particular configuration as outlined in the Environment section below, but pods are crash-looping on startup, or cannot route properly once deployed via the desired gateway for this IP.
Resolution
- This issue is being tracked in https://issues.redhat.com/browse/OCPBUGS-27821 Which will seek to resolve the problem by removing the requirement for a default gateway on a secondary IP:
If someone deploys with no default gw on br-ex, we allow OVNK to come up, and we dont configure any default gw in OVN GR. This was intentionally done some time ago. But we didn't consider the case with local gateway mode (routingViaHost=true), and kube-api is on another interface. So we plan on making a change where if there is no default gateway detected via br-ex, then i will add a default route to the fake masquerade IP 169.254.169.4. This will allow OVN to get the traffic out to br-ex, where it will be forwarded into the host and work correctly.
- As a workaround, you can address this problem by forcibly defining your desired interface's route to have a lower metric for the route than
48/49 - which is set byconfigure-ovs.sh` see [1] in root cause below.
[core@testing ~]$ ip route | grep default
default via 11.xxx.xxx.yyy dev bond1 proto static metric 25 ##set value lower on our secondary interface to force priority over automatically set br-ex routing.
default via 10.xxx.xxx.yyy dev br-ex proto static metric 48 ## value is automatically set/cannot be overridden + has (generally) a lower value metric to force priority, but not the LOWEST value.
[core@testing ~]$
[core@testing ~]$ ip -6 route | grep default
default via 2001::::3 dev bond1 proto static metric 25 pref medium
default via 2001::::1 dev br-ex proto static metric 48 pref medium
[core@testing ~]$
[root@testing ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1 | grep METRIC
IPV4_ROUTE_METRIC=25
IPV6_ROUTE_METRIC=25
Root Cause
[1] This is pushed by the machine-config using the yaml and is mentioned in the github:
https://github.com/openshift/machine-config-operator/blob/master/templates/common/_base/files/configure-ovs-network.yaml#L140
// Notes from Engineering
# when creating the bridge, we use a value lower than NM's ethernet device default route metric
# (we pick 48 and 49 to be lower than anything that NM chooses by default)
Diagnostic Steps
- Observe that when you have applied your configuration to use RoutingViaHost=true, on a node that is using a secondary IP for br-ex handling, the routes are misconfigured. br-ex is still putting priority routing on the original/primary interface and not the overridden interface you've selected to handle networking.
- Observe that pods deployed as hostNetwork (and nodes themselves in this configuration) cannot reach the primary default service IP for kube-apiserver
- You have ensured that the configuration for network also includes
ipForwarding=global - You have ensured that your secondary interface includes a default gateway configuration to allow routing appropriately.
- You have ensured that
NODEIP_HINTis set properly as per our documentation - If br-ex has gateway configured after migration as below, you aren't hitting this bug.
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.1 0.0.0.0 UG 48 0 0 br-ex <------- default route via br-ex
192.168.0.0 0.0.0.0 255.255.255.0 U 48 0 0 br-ex
192.168.1.0 0.0.0.0 255.255.254.0 U 100 0 0 ens192
10.0.137.0 192.168.1.1 255.255.255.0 UG 0 0 0 ens192
10.0.193.15 192.168.1.1 255.255.255.255 UGH 100 0 0 ens192
10.0.193.34 192.168.1.1 255.255.255.255 UGH 0 0 0 ens192
10.0.193.34 192.168.1.1 255.255.255.255 UGH 100 0 0 ens192
10.0.193.157 192.168.1.1 255.255.255.255 UGH 0 0 0 ens192
169.254.169.0 0.0.0.0 255.255.255.248 U 0 0 0 br-ex
169.254.169.1 0.0.0.0 255.255.255.255 UH 0 0 0 br-ex
169.254.169.3 172.41.8.1 255.255.255.255 UGH 0 0 0 ovn-k8s-mp0
172.40.64.0 169.254.169.4 255.255.192.0 UG 0 0 0 br-ex
172.41.0.0 172.41.8.1 255.255.128.0 UG 0 0 0 ovn-k8s-mp0
172.41.8.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments