[RHOCP 4] EgressIP with OVN-Kubernetes causes network connectivity issues

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.6
    • 4.7
    • 4.8
    • 4.9
  • OVN-Kubernetes

Issue

  • Pods have connectivity issues when using an EgressIP including:

    • Dropping of connection from time to time.
    • Experiencing network timeouts when connecting to services outside of a cluster.
  • After adding an EgressIP resource to selected Namespaces/Pods, the configuration may initially function correctly, but after some time the pods may start to display the above issues.

Resolution

Diagnostic Steps

  • To check the current status of the NBDB for an EgressIP run the following:

    EGRESS_NAME=egressips-crd-name
    POD=$(oc -n openshift-ovn-kubernetes get pod -o custom-columns=POD:.metadata.name --no-headers  --selector='app==ovnkube-master' | head -n1)
    oc -n openshift-ovn-kubernetes exec $POD -c ovnkube-master -it -- ovsdb-client --private-key=/ovn-cert/tls.key --certificate=/ovn-cert/tls.crt --ca-cert=/ovn-ca/ca-bundle.crt -f csv --no-headings dump ssl:localhost:9641 OVN_Northbound NAT | grep "name=$EGRESS_NAME" | tr -d '"' | cut -d ',' -f5,9 | sort -u
    
  • For instance, observe that 123.xx.xx.xx is attached to both k8s-worker-zzzzz and k8s-worker-yyyyy. This is indicative of the known bug.

    123.xx.xx.xx,k8s-worker-zzzzz
    123.xx.xx.xx,k8s-worker-yyyyy
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.