OCP 4.18: after upgrade, Egress IP traffic flow is interrupted/failing intermittently.
Issue
- After an upgrade from an earlier version of Openshift (4.17 or lower), EgressIPs are inconsistently applied for traffic leaving the nodes.
- Some pods periodically NAT traffic from the local host node instead of the egressIP
- Traffic is intermittently dropped either due to firewall rules rejecting node traffic, or return path traffic sends to the wrong host (stale nats).
- Critically - An OVN-Kubernetes DB rebuild does NOT suffice to resolve the issue on affected egress-assignable nodes.
Environment
- Red Hat OpenShift Container Platform (RHOCP) 4
- OVN-Kubernetes
- EgressIP is used
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.