Troubleshooting OpenShift Container Platform 4: DNS

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (OCP) 4
  • Red Hat Core OS (RHCOS)
  • Domain Name System (DNS)
  • Cluster Openshift DNS Operator

Issue

  • Troubleshoot DNS issues in RHCOS 4

Diagnostic Steps

  1. Check the cluster operator to see if it is available:

    # oc get clusteroperator dns
    
  2. Check to see if there are pods and services created under openshift-dns-operator

    # oc get all  -n openshift-dns-operator  -o wide
    
  3. Check the logs of dns-operator pod

    # oc logs pod/`oc get pods -o=jsonpath="{.items[0].metadata.name}" -n openshift-dns-operator` -n openshift-dns-operator 2>/dev/null || oc logs pod/`oc get pods -o=jsonpath="{.items[0].metadata.name}" -n openshift-dns-operator` -c dns-operator -n openshift-dns-operator
    
  4. Check dns components are running under openshift-dns project:

    # oc get all  -n openshift-dns
    
  5. Check the pod resolver is pointing to DNS Service IP:

    # export PODS=`oc get pods -o=jsonpath="{.items[*].metadata.name}" -n openshift-apiserver`
    # for pod in $PODS;do oc exec $pod -n openshift-apiserver -- cat /etc/resolv.conf ;done;
    
  6. You can check coredns container logs on node:

    # export PODS=`oc get pods -o=jsonpath="{.items[*].metadata.name}" -n openshift-dns`
    # for pod in $PODS; do oc logs $pod -c dns -n openshift-dns|sed "s/^/$pod\t/"; echo; done
    
  7. Test resolving kubernetes service hostname to and from every DNS pod:

    # oc get pods -n openshift-dns -o custom-columns="Pod Name:.metadata.name,Pod IP:.status.podIP,Node IP:.status.hostIP,Status:.status.phase"
    
    # DST_HOST=kubernetes.default.svc.cluster.local; for dnspod in `oc get pods -n openshift-dns -o name --no-headers`; do for dnsip in `oc get pods -n openshift-dns -o go-template='{{ range .items }} {{index .status.podIP }} {{end}}'`; do echo -ne "$dnspod\tquerying $DST_HOST to $dnsip ->\t"; oc exec -n openshift-dns $dnspod -- dig @$dnsip $DST_HOST -p 5353 +short 2>/dev/null ; done; done
    
  8. Test resolving external queries like redhat.com:

    # DST_HOST=redhat.com; for dnspod in `oc get pods -n openshift-dns -o name --no-headers`; do for dnsip in `oc get pods -n openshift-dns -o go-template='{{ range .items }} {{index .status.podIP }} {{end}}'`; do echo -ne "$dnspod\tquerying $DST_HOST to $dnsip ->\t"; oc exec -n openshift-dns $dnspod -- dig @$dnsip $DST_HOST -p 5353 +short 2>/dev/null ; done; done
    

    If there are DNS forwarding entries in the dns operator, pay special attention to run these commands for addresses that will only resolve with that forwarding.

    Also not that image lookups will not use the DNS operator.

  9. Check within a pod to gather the DNS lookup time versus total request time

    • IPv4 and IPv6

      # echo $pod
      pod-example-5f78c768b-cg88c
      # oc exec $pod  -- bash -c 'while true; do echo -n "$(date)  "; curl -s -o /dev/null -w "%{time_namelookup} %{time_total} %{http_code}\n" https://www.redhat.com -k; sleep 10; done'
      
    • IPv4 only:

      # oc exec $pod -- bash -c 'while true; do echo -n "$(date)  "; curl -s -o /dev/null -w "%{time_namelookup} %{time_total} %{http_code}\n" --ipv4 https://www.redhat.com -k; sleep 10; done'
      
    • IPv6 only

      # oc exec $pod -- bash -c 'while true; do echo -n "$(date)  "; curl -s -o /dev/null -w "%{time_namelookup} %{time_total} %{http_code}\n" --ipv6 https://www.redhat.com -k; sleep 10; done'
      
  10. From inside a pod, run the following commands to potentially isolate SDN connections issues towards the coredns pods.

    # echo $pod
    pod-example-5f78c768b-cg88c
    # oc -n openshift-dns get pod -o wide | awk '!/IP/ {print $6}' | while read IP; do kubectl exec $pod -- nslookup -port=5353 www.redhat.com $IP; done
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

12 Comments

Step 9 is broken -- both IPv4 & IPv6 are identical -- presumably IPv6 should be changed from '-4' to '-6'.

Thanks for reporting! I fixed it and make it a bit more clear with the use of --ipv4 and --ipv6 instead of -4 -6

Step 9 indicates it wants a pod selected, but fails to give guidance on how to list pods that may be used.

In this case I think the intent was to use any pod capable of running the curl command towards "redhat.com" Should perhaps clarify that with deploying a simple container for the example.

Step 3 is broken -- returns:

error: a container name must be specified for pod dns-operator-xxxxxxxxxx-xxxxx, choose one of: [dns-operator kube-rbac-proxy]

Presumably, the dns-operator is desired, which would suggest:

oc logs pod/`oc get pods -o=jsonpath="{.items[0].metadata.name}" -n openshift-dns-operator` -n openshift-dns-operator dns-operator

Before 4.4 this was not required. I added a bit of logic to it so it works in both cases.

For Step 6, I humbly suggest this improvement -- which will output the results to the screen like the rest of these steps (so everything can be logged or redirected as desired), rather than having a bunch of files created:

for pod in $PODS; do oc logs $pod -c dns -n openshift-dns|sed "s/^/$pod\t/"; echo; done

I think the idea was that users attach the files to a case, however for consistency I have updated it with your suggestion as no other command redirected data to files.

FWIW: Steps 7 & 8 can be combined -- humbly:

for dnspod in oc get pods -n openshift-dns -o name --no-headers; do dnsips=oc get pods -n openshift-dns -o go-template='{{ range .items }} {{index .status.podIP }} {{end}}'; oc exec -n openshift-dns $dnspod -c dns -- bash -c "for dnsip in $dnsips; do dig @\$dnsip kubernetes.default.svc.cluster.local -p 5353 +short 2>/dev/null|sed \"s/^/`date '+%T'`\t\$dnsip\tinternal: /\"; echo -e \"`date '+%T'`\t\$dnsip\"; dig @\$dnsip redhat.com -p 5353 +short 2>/dev/null|sed \"s/^/`date '+%T'`\t\$dnsip\texternal: /\"; date '+%T'; done"|sed "s|(^[^\t]*)|\1\t$dnspod|"; echo; done

Also note that step 8 references google, but redhat is actually being used..

Changed it to redhat.com. The tests are separated to show how we can test for various internal and external hostnames.

Step 10 ought to be clarified a bit -- "From inside a pod, run the following to discard" suggests one has to select a pod, enter it then run the given command. It appears that this will be done by the given command, so language more like "This will exec from inside a pod to discover".. also, presumably "discard" was a typo for "discover" or troubleshoot..

Also, step 10 is broken -- requires a variable $POD be set, but no direction on how to fetch something to set $POD to..

Thanks for reporting, clarified the goal and what the expected $pod var.