Customer Advisory
Advisory ID: | D2IQ-2020-0007 | ||
---|---|---|---|
Severity: | Critical | ||
Synopsis: | Cluster Certificates do not automatically renew on cluster update | ||
Affected Product(s) |
Konvoy |
||
|
1.2.x, 1.3.x, 1.4.x | ||
Issue date: | 08-17-2020 | ||
Updated on: | 08-17-2020 Initial Advisory |
Issue
A known bug in kubeadm versions prior to 1.17 prevents it from automatically renewing the cluster certificates for kubernetes when the cluster is updated. This means that after being active for one year, the cluster will cease functioning as the certs will expire. Please see the following link for more information about this bug: https://github.com/kubernetes/kubeadm/issues/1818.
Resolution
Updating Konvoy to version 1.5.0 or newer will ensure that your certs will be automatically renewed as part of the upgrade process.
If you are unable to upgrade as the certs have already expired, or you want to verify their expiration date, you can use kubeadm to manually renew them. You must perform the following steps on each control plane in the cluster.
SSH to each control plane node and run:
sudo kubeadm alpha certs check-expiration
This will list the certificates on this control plane and their expiration status:
-sh-4.2$ sudo kubeadm alpha certs check-expiration CERTIFICATE EXPIRES RESIDUAL TIME EXTERNALLY MANAGED admin.conf Aug 14, 2021 18:43 UTC 364d no apiserver Aug 14, 2021 18:43 UTC 364d no apiserver-etcd-client Aug 14, 2021 18:43 UTC 364d no apiserver-kubelet-client Aug 14, 2021 18:43 UTC 364d no controller-manager.conf Aug 14, 2021 18:43 UTC 364d no etcd-healthcheck-client Aug 14, 2021 18:43 UTC 364d no etcd-peer Aug 14, 2021 18:43 UTC 364d no etcd-server Aug 14, 2021 18:43 UTC 364d no front-proxy-client Aug 14, 2021 18:43 UTC 364d no scheduler.conf Aug 14, 2021 18:43 UTC 364d no
If your certificates are at risk of expiring soon or the cluster is currently down due to already expired certificates, first back up the directory /etc/kubernetes/pki on all control plane nodes. Then you can renew the certificates via the following command:
sudo kubeadm alpha certs renew all
This will list all certificates that it updates as well as any issues it encounters:
-sh-4.2$ sudo kubeadm alpha certs renew all certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed certificate for serving the Kubernetes API renewed certificate the apiserver uses to access etcd renewed certificate for the API server to connect to kubelet renewed certificate embedded in the kubeconfig file for the controller manager to use renewed certificate for liveness probes to healtcheck etcd renewed certificate for etcd nodes to communicate with each other renewed certificate for serving etcd renewed certificate for the front proxy client renewed certificate embedded in the kubeconfig file for the scheduler manager to use renewed
After renewing your certificates, run sudo kubeadm alpha certs check-expiration again to confirm they are now valid for one year from the current date.
In rare occasions, customer have found that after renewing the certificates, some control-plane components, that's kube-apiserver, kube-controller-manager and the scheduler, do not load the new certificates and have issues communicating the api-server and/or etcd. When this issue is encountered, the following events are logged in the kube-apiserver and scheduler logs:
kube-apiserver:
E0712 15:39:52.988593 1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid: current time 2022-07-12T15:39:52Z is after 2022-07-09T10:09:07Zscheduler:
E0712 15:47:37.749691 1 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Unauthorized
E0712 15:47:39.896591 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
To resolve this issue, we recommend forcing the kubelet to restart the aforementioned control-plane components. As kube-apiserver, etcd, scheduler and kube-controller-manager are configured as static pods, their manifest files are placed in the /etc/kubernetes/manifests/.
ls -al /etc/kubernetes/manifests/
total 20
drwxr-xr-x. 2 root root 113 Jul 13 13:18 .
drwxr-xr-x. 8 root root 243 Jul 13 13:23 ..
-rw-------. 1 root root 2308 Jul 13 13:18 etcd.yaml
-rw-------. 1 root root 4937 Jul 13 13:18 kube-apiserver.yaml
-rw-------. 1 root root 3678 Jul 13 13:18 kube-controller-manager.yaml
-rw-------. 1 root root 2305 Jul 13 13:18 kube-scheduler.yaml
Backing up the /etc/kubernetes/manifests, then removing the manifest files for the api-server, scheduler and controller-manager:
cp -r /etc/kubernetes/manifests{,.bak}
rm /etc/kubernetes/manifests/kube-*
Before moving back the the manifests to /etc/kubernetes/manifests/, we suggest checking with crictl and confirm that the containers were actually removed:
crictl ps -a | grep -iE "kube-apiserver|etcd|kube-scheduler|kube-controller-manager"
Then move back the files:
cp /etc/kubernetes/manifests.bak/kube-* /etc/kubernetes/manifests/
and confirm that the control-plane components containers are up and running:
crictl ps -a | grep -iE "kube-apiserver|etcd|kube-scheduler|kube-controller-manager"
d5d725a2ee7bb 70ae19262a812 About a minute ago Running kube-scheduler 0 b58a1102b2f96
1d7bf05c121db d862ea5b2791f About a minute ago Running kube-apiserver 0 c5cece331e3ea
0e646b8790838 03ec922c75bc0 About a minute ago Running kube-controller-manager 0 ccbb882c4c0d6
e695642316090 0369cf4303ffd 2 hours ago Running etcd 1 05fe04f1e0038