OCP 4.x – SDN to OVN migration leading to connection timeouts

Solution Verified - Updated -

Issue

  • After a live migration from OpenShift SDN to OVN-Kubernetes, one (or a few) worker / router nodes begin to drop image pulls with dial tcp … i/o timeout when calling to the internal registry address as exampled below:
oc get events
---
175m        Warning   Failed             pod/pod-1                       Failed to pull image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest": rpc error: code = DeadlineExceeded desc = Get "https://image-registry.openshift-image-registry.svc:5000/openshift/token?account=serviceaccount&scope=repository%3Aopenshift%2Ftools%3Apull": dial tcp 172.21.64.43:5000: i/o timeout
  • The same workload runs normally when it is scheduled on other nodes.

  • ovn-nbctl lr-route-list ovn_cluster_router executed on the troubled node shows duplicate ECMP routes for many pod-CIDRs – a clear sign that the local OVN databases contain stale data:

#This is an example script to run that returns the routes in ovn_cluster_router with 1st node in loop showing. The IP addresses have been modified to show duplicated routes that you may see on your cluster.
for n in $(oc get node -o name); do
  echo "===== $n ====="
  oc debug -Tq "$n" -- \
    chroot /host sh -c \
    'crictl exec $(crictl ps --name ovnkube-controller -q | head -n1) \
     ovn-nbctl lr-route-list ovn_cluster_router'
done
---
===== node/master-0.testserver.lab.example.com =====
IPv4 Routes
Route Table <main>:
            10.64.0.0/23                100.88.0.6 dst-ip ecmp
            10.64.0.0/23                100.88.0.9 dst-ip ecmp
            10.64.2.0/23                100.88.0.2 dst-ip ecmp
            10.64.2.0/23                100.88.0.3 dst-ip ecmp
            10.65.0.0/23                100.88.0.10 dst-ip ecmp
            10.65.0.0/23                100.88.0.7 dst-ip ecmp
            10.65.2.0/23                100.88.0.3 dst-ip ecmp
            10.65.2.0/23                100.88.0.4 dst-ip ecmp
            10.65.4.0/23                100.88.0.7 dst-ip ecmp
            10.65.4.0/23                100.88.0.9 dst-ip ecmp
            10.66.0.0/23                100.88.0.11 dst-ip ecmp
            10.66.0.0/23                100.88.0.8 dst-ip ecmp
            10.66.2.0/23                100.88.0.10 dst-ip ecmp
            10.67.2.0/23                100.88.0.5 dst-ip ecmp
            10.67.0.0/14                100.64.0.11 src-ip ecmp
            10.67.0.0/14                100.64.0.8 src-ip ecmp

Environment

  • Red Hat OpenShift Container Platform 4.15+ (bare-metal, OVN-Kubernetes CNI)

  • Live migration to OVN-K completed

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content