OCP 4.x – SDN to OVN migration leading to connection timeouts
Issue
- After a live migration from OpenShift SDN to OVN-Kubernetes, one (or a few) worker / router nodes begin to drop image pulls with
dial tcp … i/o timeout
when calling to the internal registry address as exampled below:
oc get events
---
175m Warning Failed pod/pod-1 Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = DeadlineExceeded desc = Get "https://image-registry.openshift-image-registry.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Ftools%3Apull": dial tcp 172.21.64.43:5000: i/o timeout
-
The same workload runs normally when it is scheduled on other nodes.
-
ovn-nbctl lr-route-list ovn_cluster_router
executed on the troubled node shows duplicate ECMP routes for many pod-CIDRs – a clear sign that the local OVN databases contain stale data:
#This is an example script to run that returns the routes in ovn_cluster_router with 1st node in loop showing. The IP addresses have been modified to show duplicated routes that you may see on your cluster.
for n in $(oc get node -o name); do
echo "===== $n ====="
oc debug -Tq "$n" -- \
chroot /host sh -c \
'crictl exec $(crictl ps --name ovnkube-controller -q | head -n1) \
ovn-nbctl lr-route-list ovn_cluster_router'
done
---
===== node/master-0.testserver.lab.example.com =====
IPv4 Routes
Route Table <main>:
10.64.0.0/23 100.88.0.6 dst-ip ecmp
10.64.0.0/23 100.88.0.9 dst-ip ecmp
10.64.2.0/23 100.88.0.2 dst-ip ecmp
10.64.2.0/23 100.88.0.3 dst-ip ecmp
10.65.0.0/23 100.88.0.10 dst-ip ecmp
10.65.0.0/23 100.88.0.7 dst-ip ecmp
10.65.2.0/23 100.88.0.3 dst-ip ecmp
10.65.2.0/23 100.88.0.4 dst-ip ecmp
10.65.4.0/23 100.88.0.7 dst-ip ecmp
10.65.4.0/23 100.88.0.9 dst-ip ecmp
10.66.0.0/23 100.88.0.11 dst-ip ecmp
10.66.0.0/23 100.88.0.8 dst-ip ecmp
10.66.2.0/23 100.88.0.10 dst-ip ecmp
10.67.2.0/23 100.88.0.5 dst-ip ecmp
10.67.0.0/14 100.64.0.11 src-ip ecmp
10.67.0.0/14 100.64.0.8 src-ip ecmp
Environment
-
Red Hat OpenShift Container Platform 4.15+ (bare-metal, OVN-Kubernetes CNI)
-
Live migration to OVN-K completed
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.