OCP4 - Loopback calls to pods via single-endpoint service fails on 4.14+ when a secondary network defined by Multus is attached
Issue
- A pod making a call to a service that has itself listed as one of (or the only) endpoints will fail to connect if that pod has a secondary Network defined by a
network-attachment-definition. - Pod curls to services in the same namespace will succeed, but curls to services where they themselves are listed as an endpoint will fail to connect to the same pod making the call (hairpinning).
- Specifically,
Pod Ainnamespace Ahas a serviceService Athat has one endpoint. This service will resolvePod Aand can be used to route traffic within this namespace from peer pods. However, calls fromPod AtoService A(resolving back toPod A) will fail, when a secondary IP address is applied to the container using multus.
$ oc get pod -n test-namespace | grep -E 'A|B'
Pod-A 1/1 Running 0 4h26m
Pod-B 1/1 Running 0 4h28m
$ oc get svc -n test-namespace | grep 'A'
pod-a-service ClusterIP 172.30.41.170 <none> 8080/TCP 138d
$oc rsh pod/Pod-A ip a
#truncated... (indicating we have a secondary IP attached with multus)...
10.xx.xx.99
120.xx.xx.145
$ oc rsh po/pod-A #call service A (hairpin back to self) (fails)
sh-4.4$ curl -v 172.30.41.170:8080
* Rebuilt URL to: 172.30.41.170:8080/
* Trying 172.30.41.170...
* TCP_NODELAY set
...timeout...
$ oc rsh po/pod-B #call service A from pod B (succeeds)
sh-4.4$ curl -v 172.30.41.170:8080
* Rebuilt URL to: 172.30.41.170:8080/
* Trying 172.30.41.170...
* TCP_NODELAY set
* Connected to 172.30.41.170 (172.30.41.170) port 8080 (#0)
> GET / HTTP/1.1
> Host: 172.30.41.170:8080
> User-Agent: curl/7.61.1
> Accept: */*
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- Observed in 4.14.33 and later
- OVN-Kubernetes CNI
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.