If you are having issues with DNS in your cluster, you can use this quick guide to get more information to provide to D2IQ support. Gather the output of each command and add it to your support ticket.
Checking the coredns logs, you may see error entries:
$> kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[ERROR] plugin/errors: 2 node.domain.com. A: read udp 1<NODEIP>:45209-><DNSSERVER>:53: i/o timeout
[ERROR] plugin/errors: 2 node.domain.com. AAAA: read udp <NODEIP>:52088-><DNSSERVER>:53: i/o timeout
[ERROR] plugin/errors: 2 node.svc.cluster.local.domain.com. A: read udp <NODEIP>:48902-><DNSSERVER>:53: i/o timeout
[ERROR] plugin/errors: 2 node.svc.cluster.local.domain.com. A: read udp <NODEIP>:38309-><DNSSERVER>:53: i/o timeout
If you do not see any errors in the coredns logs, you need to enable logging in the config map - see the kubernetes documentation. Once changed, run
kubectl rollout restart -n kube-system deployment/coredns
and repeat the process that pointed to a DNS failure, and now you should see some failed entries in the log. You should see logging similar to the below:
[INFO] 192.168.224.131:56527 - 25854 "A IN google.com.cluster.local. udp 42 false 512" NXDOMAIN qr,aa,rd 135 0.000082441s
[INFO] 192.168.224.131:53611 - 29229 "A IN google.com. udp 28 false 512" NOERROR qr,rd,ra 54 0.002412358s
[INFO] 192.168.224.131:35919 - 32688 "A IN google.com.svc.cluster.local. udp 46 false 512" NXDOMAIN qr,aa,rd 139 0.00010114s
[INFO] 192.168.224.131:45732 - 54213 "A IN google.com. udp 28 false 512" NOERROR qr,aa,rd,ra 54 0.000064642s
If you believe your pods are not communicating correctly with your coreDNS pods, you can validate that the IP of the pods sending the request makes it to coreDNS. The initial entry, in this case, `192.168.224.131`, relates to the pod that has generated the lookup. This can also be used to validate that the pod sending the request is getting the same response from coreDNS that it is receiving, and it is not being alerted along the way. If you are not seeing any requests from your pod's IP, you may have a networking issue preventing the pods from communicating correctly.
If pods are having trouble querying external URL's it may be due to the ndot configuration in the cluster. You can confirm this by checking for the URL you are querying after turning on debug logging, if this is the case you may see logging such as the below:
[INFO] <IP>:59175 - 29555 "A IN test.mytest.com.mytest.com. udp 55 false 512" NXDOMAIN qr,rd,ra 134 0.025318742s
In the above case, we can see an additional search entry added to our URL .mytest.com.mytest.com. This indicates that coreDNS is not seeing the URL we are passing as an FQDN. If this is the case, reducing the number of ndots required for a URL to be considered fully qualified or editing the search terms can help remedy the situation. Please note: configuring your cluster to have a ndot value of '1' for all pods will break pod-to-pod communication.
Firstly, check that all the nodes have the same issue. SSH to all nodes and run a simple nslookup.
nslookup production.domain.com 192.168.0.1 # check internal DNS
nslookup submit.funnycatvideos.com 220.127.116.11 # check external DNS
If the nslookup on the nodes fails, then the issue is with your internal routing, and should be corrected before continuing.
Check production pod
Check and output the production pod's resolv.conf:
kubectl exec -it -n <namespace> <podname> -- /bin/bash -c "cat /etc/resolv.conf"
If the nodes are fine, now we can check the pods. To begin, you should pull the dnsutils container from GCR. An example yaml for creating a pod can be found on the kubernetes DNS debugging page.
With this pod, you can run simple diagnostic DNS tools to check out your environment, for example, run the following to check the resolvers file:
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
Now, run the following commands to get the lay of the land.
kubectl exec -it dnsutils -- nslookup kubernetes.default
kubectl exec -it dnsutils -- nslookup problemdomain.com 192.168.0.1 #check internal DNS
kubectl exec -it dnsutils -- nslookup problemdomain.com 18.104.22.168 #check external DNS
kubectl exec -it dnsutils -- dig -x nodeipaddress
"nslookup" has three failures to be aware of:
- "SERVFAIL" indicates a failure with the DNS server - it's running, but broken.
- "NXDOMAIN" indicates that the dns server doesn't know the domain name.
- "no servers could be reached" is normally indicative of a routing or other networking issue where the dns server is unreachable.
The "dig -x" command checks the reverse DNS entry of the IP address. If configured correctly, it should be a PTR record of the reverse octet of the IP address in URL form, ending with "ip-addr.arpa". For more information, see RFC 1035.
Armed with the details here, D2IQ support should be better prepared to help you diagnose your DNS issue.
(Also, there are a number of knowledgebase articles that go into more detail about specific DNS issues)