Advisory ID: | D2IQ-2022-0002 |
Severity: | Critical |
Synopsis: | ClusterResourceSet "ApplyAlways" strategy creates an unbounded number of service account token Secrets |
Affected Products & Versions | DKP 2.2.0 |
Issue date: | 03 May 2022 |
Updated on: |
03 May 2022 |
Problem Description
Every cluster has a set of core addons, such as implementations of Container Network Interface (CNI), and Container Storage Interface (CSI). The controller responsible for the core addons is updating Kubernetes resources for these addons when it should not.
For some addons, when the controller updates ServiceAccount resources, this causes the Kubernetes control plane (specifically the ServiceAccount controller) to create a new Secret resource that holds a token for the ServiceAccount.
Because the existing Secret resources are not deleted, the number of Secret resources increases without bound. We observed hundreds of new Secret resources created per day for every affected ServiceAccount. Large numbers of resources of any kind can impact the performance and stability of the cluster control plane, as well as the performance and stability of controllers that use such resources.
Context & Symptoms
The updates of ServiceAccount resources should not affect the cluster control plane, or applications running on the cluster. However, over time, the updates can cause large numbers of Secret resources to be created. Large numbers of resources of any kind can impact the performance and stability of the cluster control plane, as well as the performance and stability of controllers that use such resources.
Workaround / Solution
ClusterResourceSets are resources that describe the configuration of core addons for each cluster. We recommend removing the ClusterResourceSet resource to prevent the controller from reconciling them further.
The ClusterResourceSet for a cluster only needs to be reconciled once, and is not required again until the core addon is upgraded. Therefore, it is safe to delete the ClusterResourceSets because they will be created for you when you upgrade the core addons for a cluster.
Remove ClusterResourceSets
To remove all ClusterResourceSet resources, in all namespaces:
kubectl delete clusterresourceset -A --all
Remove unused tokens
The following command is a dry-run command and will not make changes to your cluster.
To remove the multiple unused tokens that were generated, run the following in a bash shell.
DRYRUN="--dry-run=server"
kubectl get secret -A -o jsonpath='{range .items[*]}{.metadata.annotations.kubernetes\.io/service-account\.uid}{"\n"}' |
grep -v '^$' |
sort |
uniq -d |
xargs -I {} kubectl get sa -A -o jsonpath='{range .items[?(@.metadata.uid=="{}")]}{.metadata.namespace}{","}{.metadata.name}{"\n"}{end}' |
while IFS=, read NAMESPACE NAME; do
kubectl ${DRYRUN} -n $NAMESPACE delete secret $(
kubectl -n $NAMESPACE get secrets -o jsonpath='{.items[?(@.metadata.annotations.kubernetes\.io/service-account\.uid=="'$(
kubectl -n $NAMESPACE get sa $NAME -o jsonpath='{.metadata.uid}'
)'")].metadata.name}' |
sed "s@$(kubectl -n $NAMESPACE get sa $NAME -o jsonpath='{.secrets[0].name}')@@"
)
done
Verify that the list does not contain any secrets for your workloads or applications before removing the dry-run flag. Once you’re comfortable with making the change permanent, set DRYRUN=”” and run the command again. Note that if there are thousands of secrets, that the deletion script may appear to pause for a long time between groups of service account tokens - this is expected behavior as kubectl waits until all the deletions are complete before returning.
This command generates a very large command line, which might exceed the default limits for your shell. If you receive a message similar to: `argument list too long: kubectl`, then you may need to adjust your shell limits using the ulimit command:
ulimit -s 65536
And then try rerunning the script again.
Note: If, for some reason, you rerun the dkp upgrade addons command using DKP v2.2.0 you will need to follow the workaround steps again as this command will recreate the clusterresourceset resources.
How to Identify Affected Products
To identify whether your cluster has this problem, run:
kubectl -n kube-system describe serviceaccount cluster-autoscaler
If the service account lists more than one entry under Tokens your cluster is affected.
For More Information
If you require further assistance, or if you have any further questions regarding this field notice, please submit a ticket at support.d2iq.com.