Users have reported that certain containers are failing to deploy with error messages that resemble the following:
Error: failed to create containerd container: create container failed validation: containers.Labels: label key and value greater than maximum size (4096 bytes), key: io.paketo.: invalid argument
This is related to a known issue with containerd version 1.5.7, which is the version that is packaged by default with nodes that are provisioned in AWS by Cluster API as of this writing. More details can be found here:
This can be resolved by replacing your nodes with ones that are using containerd 1.4.13-d2iq.1, which is currently the tested and supported version that is packaged by default in AMIs that are built using the Konvoy Image Builder.
To get started on this process, you will need to download and install the Konvoy Image Builder to your client machine:
Follow the docs to download and install the konvoy-image binary and all of the contents of its directory and subdirectories, then set your AWS credential environment variables.
Run the build command using the base image file that corresponds to the OS distro that your cluster is already using. Make sure you also set the --kubernetes-version flag that corresponds to your DKP version. See the release notes for your DKP version to find the correct Kubernetes version.
For example, if you are running an Ubuntu 20 cluster with DKP 2.2.2, in AWS us-west-2, you can run the following:
./konvoy-image build --region us-west-2 images/ami/ubuntu-20.yaml --kubernetes-version=1.22.8
Once the image build is complete, it will give you an AMI ID. Note that down for later when we update the nodepool.
Check the existing nodepools in your cluster with the following command, making sure to fill in your cluster name:
./dkp get nodepools --cluster-name <cluster-name>
If you're using default values for your cluster, you should only see the one machine deployment nodepool, which represents your worker nodes.
Otherwise, if you're using your own custom nodepools, you will want to specify the nodepool that contains the nodes that you wish to change the containerd version on. You can repeat the process for each one if you have multiple nodepools to update.
Once you are ready to update the nodepool, you can put it all together with this command:
./dkp update nodepool aws --cluster-name <cluster-name> --ami <AMI-ID that you built with KIB> <nodepool-name>
From there, Cluster API will handle moving workloads around as it brings the new machines up and the old ones down.
In a cluster of four worker nodes, this took about twenty minutes in our testing. Please be patient, as this process might take longer depending on your node count or type of workload.
Once the process is complete, you can verify the new containerd version by connecting to a node directly via SSH (or aws ssm start-session) and running: