Background
During cluster creation, we can specify the private registry to pull images from. Either via the --registry flags in the DKP CLI or through the use of image overrides.
Then in Harbor, a single host server can serve multiple registries, which are called Projects. Example,
https://harbor-domain.com/project1
https://harbor-domain.com/project2
Issue
Depending on the use case, there are scenarios where in, when the cluster was created, the registry https://harborregistryhost.com/project1 was configured. And certain custom workload images needs to be pulled from https://harborregistryhost.com/project2 or any N number of harbor project registry. Would cause a failure, with the similar error to
Pulling image "harborregistryhost.com/dkp-images/kubeflow:2.1.0-jupyter-spark-3.3.0-tensorflow-2.9.1-gpu"
Warning Failed 5s (x2 over 21s) kubelet Failed to pull image "harborregistryhost.com/dkp-images/kubeflow:2.1.0-jupyter-spark-3.3.0-tensorflow-2.9.1-gpu": rpc error: code = NotFound desc = failed to pull and unpack image "harborregistryhost.com/dkp-images/kubeflow:2.1.0-jupyter-spark-3.3.0-tensorflow-2.9.1-gpu": failed to resolve reference "harborregistryhost.com/dkp-images/kubeflow:2.1.0-jupyter-spark-3.3.0-tensorflow-2.9.1-gpu": harborregistryhost.com/dkp-images/kubeflow:2.1.0-jupyter-spark-3.3.0-tensorflow-2.9.1-gpu: not found
Warning Failed 5s (x2 over 21s) kubelet Error: ErrImagePull
And if we SSH into one of the nodes and attempt to pull the images manually via CRICTL cli. A similar error is produced.
Although, when using CTR cli, the images are pulled fine.
The issue is observed on a similar config.toml below
[plugins."io.containerd.grpc.v1.cri".registry] [plugins."io.containerd.grpc.v1.cri".registry.mirrors] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."*"] endpoint = ["https://harborregistryhost.com/v2/pull-proxy-cache
/"] [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"] endpoint = ["https://harborregistryhost.com/v2/pull-proxy-cache
/","https://registry-1.docker.io"] [plugins."io.containerd.grpc.v1.cri".registry.configs]
Solution
The issue is caused by the configuration of config.toml. Specifically with the wildcard entry endpoint and the need to add the "base" harbor hostname as a mirror.
With the following config.toml configuration, we are able to pull from any harbor project, regardless if it's defined as an endpoint or not.
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."*"]
endpoint = ["https://harborregistryhost.com/v2/pull-proxy-cache", "https://harborregistryhost.com/v2/dkp-kaptain-images", "https://harborregistryhost.com/v2/konvoy-images", "https://harborregistryhost.com/v2"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."harborregistryhost.com"]
endpoint = ["https://harborregistryhost.com/v2"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://harborregistryhost.com/v2/pull-proxy-cache", "https://harborregistryhost.com/v2/dkp-kaptain-images", "https://harborregistryhost.com/v2/konvoy-images", "https://harborregistryhost.com/v2","https://registry-1.docker.io"]