Affected Component(s): Konvoy Air Gapped Deployments
Overview/Background
When you deploy a Konvoy cluster in an air gapped environment, there are a few extra steps involved and more complex configurations required for a successful deployment. If you do have a failure at any point during the cluster deployment related to docker images failing to pull, and you've already verified that your private docker registry can serve those images (instructions on how to perform this process can be found here) there could be issues with your Konvoy deployment itself that need addressing.
Solution
To deploy a Konvoy cluster in air gapped mode, you are required to modify the cluster.yaml so that instead of reaching out over the internet it uses the provided air-gapped artifacts in the local Konvoy folder. Misconfiguring your cluster.yaml or skipping a step will cause konvoy up to fail because it won't know it needs to reach your private docker registry. You should check every configRepositiory item in your cluster.yaml to ensure that they are correctly configured. For example, the default configuration for base addons in cluster.yaml is:
- configRepository: https://github.com/mesosphere/kubernetes-base-addons
configVersion: stable-1.16-1.2.0
This must be modified so that the images for base addons are pulled properly:
- configRepository: /opt/konvoy/artifacts/kubernetes-base-addons
configVersion: stable-1.16-1.2.0
helmRepository:
image: mesosphere/konvoy-addons-chart-repo:v1.4.4
You can see that we've changed the URL from https://github.com/mesosphere/ to /opt/konvoy/artifacts/, but what about that extra helmRepository information? In your Konvoy installation folder there is a folder called images. Inside this folder are all the docker images required for deployment. If you are trying to deploy Konvoy version 1.4.4, you will find a docker.io_mesosphere_konvoy-addons-chart-repo:v1.4.4.tar file inside.
We must tell Konvoy where to pull the helm charts for installation and this image contains them. You must add this helmRepository information to every configRepository in your cluster.yaml. At present there are 4 configRepositiories you should verify are configured properly if you choose to enable them for your Konvoy deployment:
- configRepository: /opt/konvoy/artifacts/kubernetes-base-addons
- configRepository: /opt/konvoy/artifacts/kubeaddons-dispatch
- configRepository: /opt/konvoy/artifacts/kubeaddons-kommander
- configRepository: /opt/konvoy/artifacts/kubeaddons-kubeflow
You may have more or less items configured depending on how you've customized your
installation, so ensure that the cluster.yaml has all the information needed for
a successful deployment! Please note, that starting from Konvoy 1.5 the directive addonRepository should be used instead of helmRepository.
What if you've triple checked your cluster.yaml and everything is configured properly, yet there is still an image pull failure during or after deployment of the cluster? There is one additional step that is critical for successful deployment in an air gapped environment. The images.json file that is included with the konvoy installation files contains a list of all of the required docker images. This file is used to push all these images to your private docker repository via
./konvoy config images seed
But this file is also used to configure containerd on your controlplane and worker node hosts! If you happened to use the wrong images.json to seed the images or you modified it before running knovoy up, most or all of the images might be present in your docker registry but the cluster might not realize it needs to request docker images from that registry. To verify that you have used the correct images.json file during deployment, you can grab the contents of the containerd config.toml file:
cat /etc/containerd/config.toml
[plugins]
[plugins.cri]
[plugins.cri.registry]
[plugins.cri.registry.mirrors]
[plugins.cri.registry.mirrors."docker.elastic.co"]
endpoint = ["https://docker-registry.on-prem-domain:5000/v2/"]
[plugins.cri.registry.mirrors."docker.io"]
endpoint = ["https://docker-registry.on-prem-domain:5000/v2/","https://registry-1.docker.io"]
[plugins.cri.registry.mirrors."gcr.io"]
endpoint = ["https://docker-registry.on-prem-domain:5000/v2/"]
[plugins.cri.registry.mirrors."quay.io"]
endpoint = ["https://docker-registry.on-prem-domain:5000/v2/"]
We can see the plugins.cri.registry.mirrors section above contains mirrors for registries such as elastic, quay and gcr. If we do not have any elastic entries in our images.json file, the registry mirror
[plugins.cri.registry.mirrors."docker.elastic.co"]
endpoint = ["https://docker-registry.on-prem-domain:5000/v2/"]
will not be created. If there is no entry for elastic in our mirrors list, we wont be able to fetch our elastic images from the private docker registry!
The images.json files created for each version of Konvoy are not interchangeable, so accidentally using an older or newer file creates a scenario where even if the correct images are uploaded to the private registry later, containerd will not know to look for them there. To ensure success with your Air Gapped deployments, you should always use the images.json file and related files included with your specific version of Konvoy. If you would like to quickly verify that there is a mirror entry for every unique docker repository in your config.toml, you easily check with jq:
jq -r '[.images[].registry] | unique | .[]' images.json
This will list the unique addresses where your image reside and allow you to evaluate any entries that are missing from either images.json or config.toml.
The images.json file is used at deploy time to configure containerd, and containerd's mirrors list is not automatically changed during subsequent konvoy up or konvoy deploy addon procedures, so any changes to containerd's registry mirror information in /etc/containerd/config.toml must be done manually. After updating config.toml with any additional mirrors you require, you must restart containerd for it to load the changes.