Worker node name resolution fails after updating to RHEL 7.9
Issue
-
After starting an OpenShift Container Platform worker node it seems that somehow the network or the SDN is not completely working, as we can see following errors:
Oct 15 09:04:00 ip-10-140-10-132.eu-central-1.compute.internal dockerd-current[1909]: time="2020-10-15T07:04:16.309085682+02:00" level=error msg="Attempting next endpoint for pull after error: Get https://docker-registry.default.svc:5000/v2/: dial tcp: lookup docker-registry.default.svc on 10.140.10.2:53: no such host"
-
After some time (between 15 minutes and 30 minutes) the issue disappears and the name resolution works as expected.
- The issue appeared after updating to Red Hat Enterprise Linux 7.9.
Environment
- Red Hat OpenShift Container Platform (OCP) 3.11
- Red Hat Enterprise Linux 7.9
- Package
cloud-init-19.4-7.el7.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.