When deploying a Kubernetes cluster using the cluster-api azure provisioner in DKP 2.2.0 and 2.2.1, some users have encountered that Azure Resource Manager blocks the resource creation requests when there is a policy that enforces the presence of tags [1], even when the tags are defined (see example below).
Example:
dkp create cluster azure --cluster-name=<cluster name> --additional-tags=owner=<owner name>,env=<env name> --ssh-public-key-file="/path/ssh.key"
The issue is due to a race condition with custom user-defined tags. To confirm that this issue has been encountered, please check the capz controller log and look for an event similar to the one below:
E0524 14:49:14.965596 1 controller.go:317] controller/azurecluster
"msg"="Reconciler error" "error"="failed to reconcile cluster services: failed to
reconcile resource group: failed to create resource <RESOURCE GROUP NAME>/<CLUSTER NAME>
(service: group): resources.GroupsClient#CreateOrUpdate: Failure responding to request:
StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403
Code=\"RequestDisallowedByPolicy\" Message=\"Resource '<CLUSTER NAME>' was disallowed
by policy. Reasons: '<Azure Policy Name>'. See error details for policy resource IDs.\"
Target=\"<CLUSTER NAME>\" AdditionalInfo=[{\"info\":{\"evaluationDetails\":{\"evaluatedExpressions\":
[{\"expression\":\"type\",\"expressionKind\":\"Field\",\"expressionValue\":
\"Microsoft.Resources/subscriptions/resourcegroups\",\"operator\":\"Equals\",\"path\":\"type\",\
"result\":\"True\",\"targetValue\":\"Microsoft.Resources/subscriptions/resourceGroups\"},{\"expression\":
\"tags[env]\",\"expressionKind\":\"Field\",\"operator\":\"Exists\",\"path\":\"tags[env]\",\"result\":\
"True\",\"targetValue\":\"false\"}],\"reason\":\"<Azure Policy Name>\"},\"policyAssignmentDisplayName\":
\"Require a tag on resource groups\",\"policyAssignmentId\":
\"/subscriptions/b1ed8f2f-ceda-41d5-a0a8-95960b5340c2/providers/Microsoft.Authorization/policyAssignments/b269fb5e3c7346dcac26b7fc\",\"policyAssignmentName\":\"b269fb5e3c7346dcac26b7fc\",\"policyAssignmentScope\":\"/subscriptions/b1ed8f2f-ceda-41d5-a0a8-95960b5340c2\",\"policyDefinitionDisplayName\":\"Require a tag on resource groups\",\"policyDefinitionEffect\":\"deny\",\"policyDefinitionId\":\"/providers/Microsoft.Authorization/policyDefinitions/96670d01-0a4d-4649-9c89-2d3abc0a5025\",\"policyDefinitionName\":\"96670d01-0a4d-4649-9c89-2d3abc0a5025\"},\"type\":\"PolicyViolation\"}]" "name"="<CLUSTER NAME>" "namespace"="default" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AzureCluster"
This is a known issue [2] and is due to a race condition with custom user-defined tags that affects the version 1.1.1 of the capz controller, and it has been fixed in version 1.3.1.
To check the version of the azure cluster api controller, please execute the following command:
kubectl -n capz-system get pod capz-controller-manager-xxxx-yyy -ojsonpath='{.spec.containers[0].image}'
To resolve the issue, D2iQ will update the image of the cluster api azure provider in the coming release of DKP (version 2.2.2). Should a user encounter this issue in DKP versions 2.2.0 or 2.2.1, they could update the capz controller image with the following command:
kubectl -n capz-system set image deployment/capz-controller-manager manager=us.gcr.io/k8s-artifacts-prod/cluster-api-azure/cluster-api-azure-controller:v1.3.1
To confirm the change, please execute the following command:
kubectl get deployment capz-controller-manager -n capz-system -ojsonpath='{.spec.template.spec.containers[0].image}
if the rollout is not triggered, manually delete the pod with the command and confirm that the pod was deployed with the fixed version of the image:
kubectl delete pod capz-controller-manager-xxxx-yyy -n capz-system
References:
[1] https://docs.microsoft.com/en-us/azure/azure-resource-manager/management/tag-policies
[2] https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2240