Duplicate Egress IP node assignments in Red Hat OpenShift Container Platform 4.x
Issue
- Pods utilizing EgressIP are unable to reach target upstream addresses when curling for a reply
- Tracing the packet flows indicate the packet does leave the pod/host network and reach the endpoint, but return packets are captured by the wrong node (not egressIP host).
- Looking into NAT tables in OVN northbound database - there are multiple duplicate stale entries linking egressIP to non-active possible EgressIP hosts, which results in nat errors and traffic flow problems.
- May see (as the issue is starting to worsen) that you get a "50%" return on curl requests - sometimes pods deploy and can reach upstream, sometimes they deploy and cannot.
Environment
- Red Hat OpenShift Container Platform (OCP) 4.8+
- Using OVN-kubernetes default Container Network Interface
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.