openshift-dns pod placement can cause intermittent DNS resolution failures

Solution Verified - Updated -

Issue

openshift-dns pods are deployed via a daemonset. In OCP 4.4.x they did not tolerate the NoSchedule taint placed on nodes.
However, in OCP 4.5.x the pods are now deployed with the operator: "Exists" toleration which tolerates taints placed on nodes and can allow pods to be scheduled onto the nodes that have taints.

In deployments with active/active LACP bonds and workloads that utilize SR-IOV (e.g., the F5 BigIP Load balancer), local traffic from a pod towards the SR-IOV VF is dropped by the switching infrastructure. Thus, if openshift-dns pods are scheduled onto the load balancer node, traffic from the dns pod to the lb will be impacted.
This causes intermittent DNS queries failures.

Environment

Red Hat Openshift Container Platform 4.5

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content