Problem
Kubelet is in a crash loop, and its log contains the following lines:
I1130 22:21:41.761345 569774 client.go:77] Connecting to docker on unix:///var/run/docker.sock
...
F1130 22:21:41.761619 569774 server.go:265] failed to run Kubelet: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
The message looks strange as Konvoy Kubernetes clusters use containerd as a containerizer engine, not Docker.
Solution
Check that the /var/lib/kubelet/kubeadm-flags.env
file exists and contains something like:
KUBELET_KUBEADM_ARGS="--cgroup-root= --cloud-provider= --container-runtime-endpoint=unix:///run/containerd/containerd.sock --container-runtime=remote --event-burst=30 --event-qps=0 --fail-swap-on=True --kube-api-burst=30 --kube-api-qps=15 --kube-reserved= --max-pods=110 --node-ip=172.19.1.2 --node-labels=konvoy.mesosphere.com/inventory_hostname=172.19.1.2 --pods-per-core=0"
If the file does not exist:
- Copy it from one of the healthy nodes.
- In the copy, replace all entries of the healthy node's IP address with the IP address of the problematic one.
- Restart the kubelet service if needed:
kubectl restart kubelet
If the file exists and its content is OK, run the systemctl cat kubelet
command and check that all the configuration files mentioned in the output have counterparts with a similar content from one of the healthy nodes.