Problem
When deploying an AWS cluster with custom DHCP settings in your VPC, you may run into a situation where your instance data is being modified when your instances are created. This is expected, but may cause problems with the kubeadm join and init commands used to create the cluster. Because of this, your cluster may get stuck on the first control plane. If this is the case, you can SSH or open an SSM connection to your instance and view the cloud-init log file found at /var/log/cloud-init-output.log
. When viewing the file, you may see entries similar to the below:
[2022-08-04 13:06:19] [preflight] Running pre-flight checks
[2022-08-04 13:06:19] [WARNING Hostname]: hostname "<invalid DNS entry>" could not be reached
[2022-08-04 13:06:19] [WARNING Hostname]: hostname "<invalid DNS entry>": lookup <invalid DNS entry> on 127.0.0.53:53: no such host
[2022-08-04 13:06:20] [certs] etcd/server serving cert is signed for DNS names [<invalid DNS entry> localhost] and IPs <IP> 127.0.0.1 ::1]
[2022-08-04 13:06:20] [certs] etcd/peer serving cert is signed for DNS names [<invalid DNS entry> localhost] and IPs [<IP> 127.0.0.1 ::1]
When the cloud-init script runs, it will pull metadata from /var/run/cloud-init/instance-data.json
to gather variables for the kubeam init and join scripts. In some cases, custom configuration will overwrite the JSON data in this file to include names that are not resolvable. Because of this, attempts to deploy a DKP 2.X cluster using your pre-provisioned AWS infrastructure will fail.
Solution
When you go to deploy a cluster, you can run the below command to get all of the relevant YAML to join your cluster:
ECHO CLUSTER_NAME=<name>
dkp create cluster aws --cluster-name=${CLUSTER_NAME} \
--dry-run \
--output=yaml \
<other flags for install> \
> ${CLUSTER_NAME}.yaml
After you have the cluster.yaml, open your favorite text editor, and view the contents. There will be multiple entries for the join and init configurations located within the KubeadmControlPlane and KubeadmConfigTemplate objects that look like the below:
KubeadmControlPlane:
initConfiguration:
localAPIEndpoint: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{ ds.meta-data.local_hostname }}'
joinConfiguration:
discovery: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{ ds.meta-data.local_hostname }}'
KubeadmConfigTemplate:
joinConfiguration:
discovery: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name: '{{ ds.meta-data.local_hostname }}'
Overall there will be three entries, inside of each entry you must replace '{{
ds.meta-data.local_hostname }}'
with '{{ v1.local_hostname }}{{"."}}{{v1.region}}{{".compute.internal"}}'
You can use the below command to replace the ds.meta-data.local_hostname entries within your yaml, or you can update each entry manually:
sed -i 's/{{ ds.meta_data.local_hostname }}/{{ v1.local_hostname }}{{"."}}{{v1.region}}{{".compute.internal"}}'/g' $CLUSTER_NAME.yaml
After replacing the template, validate that each of your three commands looks like the below:
KubeadmControlPlane:
initConfiguration:
localAPIEndpoint: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name:'{{ v1.local_hostname }}{{"."}}{{v1.region}}{{".compute.internal"}}'
joinConfiguration:
discovery: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name:'{{ v1.local_hostname }}{{"."}}{{v1.region}}{{".compute.internal"}}'
KubeadmConfigTemplate:
joinConfiguration:
discovery: {}
nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws
name:'{{ v1.local_hostname }}{{"."}}{{v1.region}}{{".compute.internal"}}'
After validating, all that needs to be done is to run kubectl create -f <cluster.yaml>
against your bootstrap or management cluster.