Problem
When using DKP 2.7.0 and 2.7.1, users may find that after 90 days the DKP UI may become inaccessible and display the following "Internal Server Error; Failed to retrieve connector list" message:
Investigating further, users may observe the following messages in the dex pod logs:
kubectl logs -n kommander deploy/dex
... time="2024-05-05T05:05:05Z" level=info msg="Connectors: 2" time="2024-05-05T05:05:05Z" level=info msg="Invoking 1 hooks to filter connectors" time="2024-05-05T05:05:05Z" level=info msg="Calling connectors webhook dex-controller" time="2024-05-05T05:05:05Z" level=error msg="Failed to filter connectors: could not call webhook: failed to send request: Post \"https://dex-dex-controller-webhook-service:18443/connectors\": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2024-05-05T05:05:05Z is after 2024-04-04T04:04:04Z"
Solution
To resolve this issue, users can execute the following steps, which will create a Kubernetes Job that will run in the target cluster and renew the involved certificates properly:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: d2iq-100649-fix-secret-reloader-sa
namespace: kommander
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: d2iq-100649-fix-secret-reloader-role
namespace: kommander
rules:
- apiGroups: [""]
resources: ["pods", "secrets"]
verbs: ["get", "list", "watch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: ["cert-manager.io"]
resources: ["certificates"]
verbs: ["get", "list", "watch", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: d2iq-100649-fix-secret-reloader-rb
namespace: kommander
subjects:
- kind: ServiceAccount
name: d2iq-100649-fix-secret-reloader-sa
roleRef:
kind: Role
name: d2iq-100649-fix-secret-reloader-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: batch/v1
kind: Job
metadata:
name: d2iq-100649-fix-secret-reloader-job
namespace: kommander
spec:
template:
spec:
serviceAccountName: d2iq-100649-fix-secret-reloader-sa
containers:
- name: reloader
image: bitnami/kubectl:1.26.4
command: [ "/bin/sh" ]
args:
- -c
- |
set -eux
echo "patching dex-dex-controller-ca certificate"
kubectl patch certificate -n kommander dex-dex-controller-ca --type=merge --patch='{"spec":{"duration":"87600h"}}'
echo "deleting dex-dex-controller-ca secret"
kubectl delete secret -n kommander dex-dex-controller-ca
echo "waiting for dex-dex-controller-ca secret"
i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-dex-controller-ca secret: exiting" && exit; elif kubectl get secret -n kommander dex-dex-controller-ca; then break; else i=$(($i+1)) && sleep 10; fi; done
echo "deleting dex-dex-controller-webhook-server-cert secret"
kubectl delete secret -n kommander dex-dex-controller-webhook-server-cert
echo "waiting for dex-dex-controller-webhook-server-cert secret"
i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-dex-controller-webhook-server-cert secret: exiting" && exit; elif kubectl get secret -n kommander dex-dex-controller-webhook-server-cert; then break; else i=$(($i+1)) && sleep 10; fi; done
echo "rolling out dex-dex-controller deployment"
kubectl rollout restart deploy/dex-dex-controller -n kommander
echo "waiting for dex-dex-controller deployment rollout"
if ! kubectl rollout status deploy/dex-dex-controller -n kommander --timeout 5m; then echo "timed out waiting for dex-dex-controller deployment: exiting" && exit; fi
echo "deleting dex-client-tls secret"
kubectl delete secret -n kommander dex-client-tls
echo "waiting for dex-client-tls secret"
i=0; while [ "$i" != 7 ]; do if [ "$i" = 6 ]; then echo "timed out waiting for dex-client-tls secret: exiting" && exit; elif kubectl get secret -n kommander dex-client-tls; then break; else i=$(($i+1)) && sleep 10; fi; done
echo "rolling out dex deployment"
kubectl rollout restart deploy/dex -n kommander
echo "waiting for dex deployment rollout"
if ! kubectl rollout status deploy/dex -n kommander --timeout 5m; then echo "timed out waiting for dex deployment: exiting" && exit; fi
restartPolicy: OnFailure
EOF
You can then check into the status of the Job with:
kubectl get job -n kommander d2iq-100649-fix-secret-reloader-job kubectl describe job -n kommander d2iq-100649-fix-secret-reloader-job kubectl get pod -n kommander -l batch.kubernetes.io/job-name=d2iq-100649-fix-secret-reloader-job
If the job fails or is stuck, you can view the pod logs with:
kubectl logs -n kommander job/d2iq-100649-fix-secret-reloader-job
When the Job completes successfully, clean up the miscellaneous resources created for the Job by executing the following commands:
kubectl delete serviceaccount -n kommander d2iq-100649-fix-secret-reloader-sa kubectl delete role -n kommander d2iq-100649-fix-secret-reloader-role kubectl delete rolebinding -n kommander d2iq-100649-fix-secret-reloader-rb kubectl delete job -n kommander d2iq-100649-fix-secret-reloader-job
This issue will be permanently resolved in DKP v2.7.2 and higher.