Customer Advisory

Advisory ID:

D2IQ-2020-0007

Severity:

Critical

Synopsis:

Cluster Certificates do not automatically renew on cluster update

Affected Product(s)

Konvoy

Affected Version(s)

1.2.x, 1.3.x, 1.4.x

Issue date:

08-17-2020

Updated on:

08-17-2020 Initial Advisory

Issue

A known bug in kubeadm versions prior to 1.17 prevents it from automatically renewing the cluster certificates for kubernetes when the cluster is updated. This means that after being active for one year, the cluster will cease functioning as the certs will expire. Please see the following link for more information about this bug: https://github.com/kubernetes/kubeadm/issues/1818.

Resolution

Updating Konvoy to version 1.5.0 or newer will ensure that your certs will be automatically renewed as part of the upgrade process.

If you are unable to upgrade as the certs have already expired, or you want to verify their expiration date, you can use kubeadm to manually renew them. You must perform the following steps on each control plane in the cluster.

SSH to each control plane node and run:

sudo kubeadm alpha certs check-expiration

This will list the certificates on this control plane and their expiration status:

-sh-4.2$ sudo kubeadm alpha certs check-expiration
CERTIFICATE                EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
admin.conf                 Aug 14, 2021 18:43 UTC   364d            no      
apiserver                  Aug 14, 2021 18:43 UTC   364d            no      
apiserver-etcd-client      Aug 14, 2021 18:43 UTC   364d            no      
apiserver-kubelet-client   Aug 14, 2021 18:43 UTC   364d            no      
controller-manager.conf    Aug 14, 2021 18:43 UTC   364d            no      
etcd-healthcheck-client    Aug 14, 2021 18:43 UTC   364d            no      
etcd-peer                  Aug 14, 2021 18:43 UTC   364d            no      
etcd-server                Aug 14, 2021 18:43 UTC   364d            no      
front-proxy-client         Aug 14, 2021 18:43 UTC   364d            no      
scheduler.conf             Aug 14, 2021 18:43 UTC   364d            no

If your certificates are at risk of expiring soon or the cluster is currently down due to already expired certificates, first back up the directory /etc/kubernetes/pki on all control plane nodes. Then you can renew the certificates via the following command:

sudo kubeadm alpha certs renew all

This will list all certificates that it updates as well as any issues it encounters:

-sh-4.2$ sudo kubeadm alpha certs renew all
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healtcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

After renewing your certificates, run sudo kubeadm alpha certs check-expiration again to confirm they are now valid for one year from the current date.

In rare occasions, customer have found that after renewing the certificates, some control-plane components, that's kube-apiserver, kube-controller-manager and the scheduler, do not load the new certificates and have issues communicating the api-server and/or etcd. When this issue is encountered, the following events are logged in the kube-apiserver and scheduler logs:

kube-apiserver:

E0712 15:39:52.988593 1 authentication.go:53] Unable to authenticate the request due to an error: x509: certificate has expired or is not yet valid: current time 2022-07-12T15:39:52Z is after 2022-07-09T10:09:07Z

scheduler:

E0712 15:47:37.749691 1 leaderelection.go:325] error retrieving resource lock kube-system/kube-scheduler: Unauthorized
E0712 15:47:39.896591 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized

To resolve this issue, we recommend forcing the kubelet to restart the aforementioned control-plane components. As kube-apiserver, etcd, scheduler and kube-controller-manager are configured as static pods, their manifest files are placed in the /etc/kubernetes/manifests/.

ls -al /etc/kubernetes/manifests/
total 20
drwxr-xr-x. 2 root root 113 Jul 13 13:18 .
drwxr-xr-x. 8 root root 243 Jul 13 13:23 ..
-rw-------. 1 root root 2308 Jul 13 13:18 etcd.yaml
-rw-------. 1 root root 4937 Jul 13 13:18 kube-apiserver.yaml
-rw-------. 1 root root 3678 Jul 13 13:18 kube-controller-manager.yaml
-rw-------. 1 root root 2305 Jul 13 13:18 kube-scheduler.yaml

Backing up the /etc/kubernetes/manifests, then removing the manifest files for the api-server, scheduler and controller-manager:

cp -r /etc/kubernetes/manifests{,.bak}
rm /etc/kubernetes/manifests/kube-*

Before moving back the the manifests to /etc/kubernetes/manifests/, we suggest checking with crictl and confirm that the containers were actually removed:

crictl ps -a | grep -iE "kube-apiserver|etcd|kube-scheduler|kube-controller-manager"

Then move back the files:

cp /etc/kubernetes/manifests.bak/kube-* /etc/kubernetes/manifests/

and confirm that the control-plane components containers are up and running:

crictl ps -a | grep -iE "kube-apiserver|etcd|kube-scheduler|kube-controller-manager"
d5d725a2ee7bb 70ae19262a812 About a minute ago Running kube-scheduler 0 b58a1102b2f96
1d7bf05c121db d862ea5b2791f About a minute ago Running kube-apiserver 0 c5cece331e3ea
0e646b8790838 03ec922c75bc0 About a minute ago Running kube-controller-manager 0 ccbb882c4c0d6
e695642316090 0369cf4303ffd 2 hours ago Running etcd 1 05fe04f1e0038