[RHOCP 4] EgressIP with OVN-Kubernetes causes network connectivity issues
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.6
- 4.7
- 4.8
- 4.9
- OVN-Kubernetes
Issue
-
Pods have connectivity issues when using an
EgressIP
including:- Dropping of connection from time to time.
- Experiencing network timeouts when connecting to services outside of a cluster.
-
After adding an
EgressIP
resource to selected Namespaces/Pods, the configuration may initially function correctly, but after some time the pods may start to display the above issues.
Resolution
-
The issue has been identified as a bug and was being tracked by the Red Hat Engineering team under BZ #1976215.
-
The issue has been fixed in the following versions:
- v4.9.0 Errata RHSA-2021:3759
- v4.8.10 Errata RHBA-2021:3299
- v4.7.30 Errata RHBA-2021:3422
- v4.6.45 Errata RHBA-2021:3517
Diagnostic Steps
-
To check the current status of the
NBDB
for anEgressIP
run the following:EGRESS_NAME=egressips-crd-name POD=$(oc -n openshift-ovn-kubernetes get pod -o custom-columns=POD:.metadata.name --no-headers --selector='app==ovnkube-master' | head -n1) oc -n openshift-ovn-kubernetes exec $POD -c ovnkube-master -it -- ovsdb-client --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --ca-cert=/ovn-ca/ca-bundle.crt -f csv --no-headings dump ssl:localhost:9641 OVN_Northbound NAT | grep "name=$EGRESS_NAME" | tr -d '"' | cut -d ',' -f5,9 | sort -u
-
For instance, observe that
123.xx.xx.xx
is attached to bothk8s-worker-zzzzz
andk8s-worker-yyyyy
. This is indicative of the known bug.123.xx.xx.xx,k8s-worker-zzzzz 123.xx.xx.xx,k8s-worker-yyyyy
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments