Chartmuseum failing to install on AWS – D2iQ

Chartmuseum is a core component of DKP. During installation, it relies on a few steps to get up and going, specifically in air-gapped environments. Because of this, understanding the workflow of its installation and tools for troubleshooting can be useful while deploying DKP. When the CLI attempts to deploy Chartmuseum, it will first try to create a deployment for the pod in air-gapped environments, port-forward the local CLI to the newly deployed pod, then upload the relevant artifacts to the Chartmuseum PVC before moving onto gitea. If Chartmuseum fails to deploy you'll, see a failure such as the below:

./dkp install kommander
✓ Ensuring applications repository fetcher is deployed
✓ Ensuring base resources are deployed
✓ Ensuring Flux is deployed 
✓ Ensuring Kommander Root CA is deployed 
✗ Ensuring Chartmuseum is deployed 
failed to ensure "Chartmuseum is deployed": ...

Depending on the error, this may not provide insight into the underlying failure. If you are seeing errors relating to the CLI, such as unable to upload charts from bundle dkp-kommander-charts-bundle.tar.gz: unable to upload chart from chart bundle to chartmuseum, cert-manager-crds.tgz, dkp-kommander-charts-bundle.tar.gz: unexpected reply from chart museum: 500 Internal Server Error, please ensure that you have unzipped the files beyond the nested NOTICES.TXT located within them. If you see a failure such as failed to get source: GitRepository.source.toolkit.fluxcd.io \"management\" not found this typically means that the Chartmuseum pod is failing to deploy. To further troubleshoot this, we can enable verbose logging for the DKP CLI by adding the -v 6 flag. This will result in lots of information that may help us identify the root cause:

dkp install kommander -v 6 ...
...
⡱ Deploying ChartMuseum GET https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander/secrets/chartmuseum-tls 200 OK in 67 milliseconds
GET https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander-flux/secrets/tls-root-ca 200 OK in 66 milliseconds
⢀⡱ Deploying ChartMuseum PATCH https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/apis/networking.k8s.io/v1/namespaces/kommander/ingresses/kommander-helm-mirror?fieldManager=kommander-cli&force=true 200 OK in 68 milliseconds
⢄⡱ Deploying ChartMuseum GET https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander/secrets/admin-chartmuseum-credentials 200 OK in 65 milliseconds
GET https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander/secrets/chartmuseum-tls 200 OK in 66 milliseconds
⢄⡱ Deploying ChartMuseum GET https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander/pods?labelSelector=app.kubernetes.io%2Finstance%3Dchartmuseum%2Capp.kubernetes.io%2Fname%3Dchartmuseum 200 OK in 66 milliseconds
using pod chartmuseum-6b887b64b4-cc48c for port-forwarding
running the port-forwarder in the background
waiting for port-forward to be established
⢎⡱ Deploying ChartMuseum POST https://adoll-0404-apiserver-66493267.us-west-2.elb.amazonaws.com:6443/api/v1/namespaces/kommander/pods/chartmuseum-6b887b64b4-cc48c/portforward 101 Switching Protocols in 229 milliseconds
Forwarding from 127.0.0.1:41605 -> 8080
Forwarding from [::1]:41605 -> 8080
the port-forward has signaled ready
Parsing file /tmp/.kommander-installer-937882518/repo/common/airgapped/kustomization.yaml/kustomization.yaml
Parsing file /tmp/.kommander-installer-937882518/repo/common/base/kustomization.yaml/kustomization.yaml

Here we can see the CLI checking the API server for the Chartmuseum pod to be up and running before setting up the port-forward from our CLI to the pod. In the case that the last log lines are the CLI looking for the chartmuseum pod to be running, our next step should be to investigate the resources available on the cluster. For Chartmuseum to be healthy, we should inspect the helmrelease, pod, and PVC. If the pod is in a pending state the installation will not finish until we are able to get the pod up and running. In some cases, you may see that your pod is pending due to an issue with the underlying volume:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 8m8s (x132 over 22h) default-scheduler running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition

If this is the case, then we will need to investigate the distribution-specific CSI pod. In AWS-based deployments, this is the ebs-csi-controller pods. If you are seeing failures inspecting the logs of the ebs plugin container will be the most helpful. If you see 403 errors in the logging, it may be due to the IAM role assigned to your nodes not being allowed to create EBS volumes; in secure environments, it is essential to remember the default storage class will attempt to create unencrypted volumes by default. If your organization requires EBS volumes to be encrypted, then you will need to add the below to your ebs storage class:

...
encrypted: true
kmsKeyId: <ARN FOR KEY>
...

After doing this, you may have to redeploy the Chartmuseum pod to ensure your changes have resolved the issue. If not, a good step in troubleshooting is attempting to use the aws CLI from your node to ensure EBS volumes can be created. If they cannot be created, then you will need to alter your configuration until they can be. After resolving any issues with the helmrelease and pod, your installation can continue, but the CLI will fail with an error similar to 'another operation is in progress..'. If that is the case, it is safe to delete the Chartmuseum deployment, helmrelease, and PVC/PV. After removing these components, you may re-run the 'dkp install kommander' command to continue your deployment.