Oftentimes Konvoy operators find that gatekeeper kube-addon pods are oom-killed because the container reaches the limit of memory allocated. When this happens, controllerManager and audit pods resources should be adjusted according to the actual workload in the cluster.
By default the resources allocated for the gatekeeper controller manager [1] and the audit [2] containers are:
controllerManager:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
audit:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
The recommended way is by defining the following code stanza under the gatekeeper block in the cluster.yaml:
- name: gatekeeper
enabled: true
values: |
config:
controllerManager:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
audit:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
To confirm the changes were correctly applied, the following kubectl command can be executed:
kubectl get deploy gatekeeper-controller-manager -n kubeaddons -ojsonpath='{.spec.template.spec.containers[*].resources}'
{"limits":{"cpu":"1","memory":"512Mi"},"requests":{"cpu":"100m","memory":"256Mi"}}
kubectl get deploy gatekeeper-audit -n kubeaddons -ojsonpath='{.spec.template.spec.containers[*].resources}'
{"limits":{"cpu":"1","memory":"512Mi"},"requests":{"cpu":"100m","memory":"256Mi"}}
References:
[1] https://github.com/mesosphere/charts/blob/master/staging/gatekeeper/values.yaml#L56
[2] https://github.com/mesosphere/charts/blob/master/staging/gatekeeper/values.yaml#L65