Problem
When using DKP 2.7.0 and 2.7.1, users may find that after 90 days the DKP UI may become inaccessible and display the following "Internal Server Error; Failed to retrieve connector list" message:
Investigating further, users may observe the following messages in the dex pod logs:
kubectl logs -n kommander deploy/dex
... time="2024-05-05T05:05:05Z" level=info msg="Connectors: 2" time="2024-05-05T05:05:05Z" level=info msg="Invoking 1 hooks to filter connectors" time="2024-05-05T05:05:05Z" level=info msg="Calling connectors webhook dex-controller" time="2024-05-05T05:05:05Z" level=error msg="Failed to filter connectors: could not call webhook: failed to send request: Post \"https://dex-dex-controller-webhook-service:18443/connectors\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-05-05T05:05:05Z is after 2024-04-04T04:04:04Z"
Solution
To resolve this issue, users can execute the following steps, which will create a Kubernetes Job that will run in the target cluster and renew the involved certificates properly:
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: d2iq-100649-fix-secret-reloader-sa namespace: kommander --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: d2iq-100649-fix-secret-reloader-role namespace: kommander rules: - apiGroups: [""] resources: ["pods", "secrets"] verbs: ["get", "list", "watch", "delete"] - apiGroups: ["apps"] resources: ["deployments"] verbs: ["get", "list", "watch", "patch"] - apiGroups: ["cert-manager.io"] resources: ["certificates"] verbs: ["get", "list", "watch", "patch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: d2iq-100649-fix-secret-reloader-rb namespace: kommander subjects: - kind: ServiceAccount name: d2iq-100649-fix-secret-reloader-sa roleRef: kind: Role name: d2iq-100649-fix-secret-reloader-role apiGroup: rbac.authorization.k8s.io --- apiVersion: batch/v1 kind: Job metadata: name: d2iq-100649-fix-secret-reloader-job namespace: kommander spec: template: spec: serviceAccountName: d2iq-100649-fix-secret-reloader-sa containers: - name: reloader image: bitnami/kubectl:1.26.4 command: [ "/bin/sh" ] args: - -c - | set -eux echo "patching dex-dex-controller-ca certificate" kubectl patch certificate -n kommander dex-dex-controller-ca --type=merge --patch='{"spec":{"duration":"87600h"}}' echo "deleting dex-dex-controller-ca secret" kubectl delete secret -n kommander dex-dex-controller-ca echo "waiting for dex-dex-controller-ca secret" i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-dex-controller-ca secret: exiting" && exit; elif kubectl get secret -n kommander dex-dex-controller-ca; then break; else i=$(($i+1)) && sleep 10; fi; done echo "deleting dex-dex-controller-webhook-server-cert secret" kubectl delete secret -n kommander dex-dex-controller-webhook-server-cert echo "waiting for dex-dex-controller-webhook-server-cert secret" i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-dex-controller-webhook-server-cert secret: exiting" && exit; elif kubectl get secret -n kommander dex-dex-controller-webhook-server-cert; then break; else i=$(($i+1)) && sleep 10; fi; done echo "rolling out dex-dex-controller deployment" kubectl rollout restart deploy/dex-dex-controller -n kommander echo "waiting for dex-dex-controller deployment rollout" if ! kubectl rollout status deploy/dex-dex-controller -n kommander --timeout 5m; then echo "timed out waiting for dex-dex-controller deployment: exiting" && exit; fi echo "deleting dex-client-tls secret" kubectl delete secret -n kommander dex-client-tls echo "waiting for dex-client-tls secret" i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-client-tls secret: exiting" && exit; elif kubectl get secret -n kommander dex-client-tls; then break; else i=$(($i+1)) && sleep 10; fi; done echo "rolling out dex deployment" kubectl rollout restart deploy/dex -n kommander echo "waiting for dex deployment rollout" if ! kubectl rollout status deploy/dex -n kommander --timeout 5m; then echo "timed out waiting for dex deployment: exiting" && exit; fi restartPolicy: OnFailure EOF
You can then check into the status of the Job with:
kubectl get job -n kommander d2iq-100649-fix-secret-reloader-job kubectl describe job -n kommander d2iq-100649-fix-secret-reloader-job kubectl get pod -n kommander -l batch.kubernetes.io/job-name=d2iq-100649-fix-secret-reloader-job
If the job fails or is stuck, you can view the pod logs with:
kubectl logs -n kommander job/d2iq-100649-fix-secret-reloader-job
When the Job completes successfully, clean up the miscellaneous resources created for the Job by executing the following commands:
kubectl delete serviceaccount -n kommander d2iq-100649-fix-secret-reloader-sa kubectl delete role -n kommander d2iq-100649-fix-secret-reloader-role kubectl delete rolebinding -n kommander d2iq-100649-fix-secret-reloader-rb kubectl delete job -n kommander d2iq-100649-fix-secret-reloader-job
This issue will be permanently resolved in DKP v2.7.2 and higher.