One potential issue is a kube node is constantly restarting due to a problem on the networking layer.
Symptoms:
In the kube node logs:
E0903 16:31:00.704454 113 pod_workers.go:190] Error syncing pod #### ("local-dns-dispatcher-kube-node-3-kubelet.kubernetes-cluster1.mesos_kube-system(###)"), skipping: failed to "CreatePodSandbox" for "local-dns-dispatcher-kube-node-3-kubelet.kubernetes-cluster1.mesos_kube-system(###)" with CreatePodSandboxError: "CreatePodSandbox for pod \"local-dns-dispatcher-kube-node-3-kubelet.kubernetes-cluster1.mesos_kube-system(###)\" failed: rpc error: code = Unknown desc = failed pulling image \"gcr.io/google_containers/pause-amd64:3.1\": Error response from daemon: Get https://gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"To debug this issue, you can run a test to pull an image on the Mesos host: # docker pull gcr.io/google_containers/pause-amd64:3.1 3.1: Pulling from google_containers/pause-amd64 67ddbfb20a22: Pull complete Digest: sha256:59eec8#######################610 Status: Downloaded newer image for gcr.io/google_containers/pause-amd64:3.1And within the kubernetes container: dcos task exec -it "kube-node-3" /bin/bash /mnt/mesos/sandbox# docker pull gcr.io/google_containers/pause-amd64:3.1 /mnt/mesos/sandbox# Error response from daemon: Get https://gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Workaround:
You might check with the networking team if a proxy is blocking the traffic to your kubernetes pod.
You might also check if any IPtables rule are correctly set/enabled on the Mesos host that could prevent the traffic to come to the pod.
For example, checking that the following IPtable rule is enabled\
"-A POSTROUTING -s 9.0.0.0/8 -m set --match-set overlay dst -j MASQUERADE"