Problem
After deploying Alertmanager and Prometheus, the configuration applied to the Alertmanager pods configuration is not updating after running konvoy deploy addons
. In this example, we initially deployed with an invalid entry in our http_config, causing our Alertmanager pods to crash loop:
kubectl get pods -A | grep alert kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager-0 1/2 CrashLoopBackOff 13 42m kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager-1 1/2 CrashLoopBackOff 13 42m
After removing this entry, applying a new configuration to our cluster.yaml, and then running konvoy deploy addons
, the pods will still redeploy with the old configuration. In some versions of Prometheus, the Operator responsible for generating a new configuration for Alertmanager does not have the proper scope and will not update the alertmanager-generated
secret. You can validate these symptoms by running the following commands and checking to see if the output is different between the secrets:
Alertmanager secret
kubectl get secret -n kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager -o jsonpath={.data.'alertmanager\.yaml'} | base64 -d
global:
resolve_timeout: 5m
http_config: {}
smtp_hello: localhost
smtp_require_tls: true
route:
receiver: "null"
group_by:
- job
routes:
- receiver: "null"
match:
alertname: test
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: "null"
templates:
- /etc/alertmanager/config/*.tmplate
Alertmanager-generated secret:
kubectl get secret -n kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager-generated -o jsonpath={.data.'alertmanager\.yaml'} | base64 -d
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
smtp_hello: localhost
smtp_require_tls: true
route:
receiver: "null"
group_by:
- job
routes:
- receiver: "null"
match:
alertname: test
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: "null"
templates:
- /etc/alertmanager/config/*.tmplate
Solution
To allow for the Prometheus operator to update the configuration of the Alertmanager secret, you will need to edit the Alertmanager object, prometheus-kubeaddons-prom-alertmanager
, using the below command:
kubectl edit -n kubeaddons alertmanagers.monitoring.coreos.com prometheus-kubeaddons-prom-alertmanager
Once you are inside of the editor you simply need to remove the .spec.alertmanagerConfigNamespaceSelector: {}
and the .spec.alertmanagerConfigSelector: {}
entries from the object:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
annotations:
meta.helm.sh/release-name: prometheus-kubeaddons
meta.helm.sh/release-namespace: kubeaddons
labels:
app: prometheus-operator-alertmanager
heritage: Helm
release: prometheus-kubeaddons
name: prometheus-kubeaddons-prom-alertmanager
namespace: kubeaddons
spec:
alertmanagerConfigNamespaceSelector: {}
alertmanagerConfigSelector: {}
externalUrl: http://prometheus-kubeaddons-prom-alertmanager.kubeaddons:9093
image: quay.io/prometheus/alertmanager:v0.21.0
....
Once the selectors have been removed, the Prometheus operator will reconcile the state of both secrets and your most recent configuration will be deployed. You can validate the fix by ensuring that the contents of both secrets is identical.