Alertmanager configuration not updating after redeploying addons – D2iQ

Problem

After deploying Alertmanager and Prometheus, the configuration applied to the Alertmanager pods configuration is not updating after running konvoy deploy addons. In this example, we initially deployed with an invalid entry in our http_config, causing our Alertmanager pods to crash loop:

kubectl get pods -A | grep alert
kubeaddons     alertmanager-prometheus-kubeaddons-prom-alertmanager-0              1/2     CrashLoopBackOff   13         42m
kubeaddons     alertmanager-prometheus-kubeaddons-prom-alertmanager-1              1/2     CrashLoopBackOff   13         42m

After removing this entry, applying a new configuration to our cluster.yaml, and then running konvoy deploy addons, the pods will still redeploy with the old configuration. In some versions of Prometheus, the Operator responsible for generating a new configuration for Alertmanager does not have the proper scope and will not update the alertmanager-generated secret. You can validate these symptoms by running the following commands and checking to see if the output is different between the secrets:

Alertmanager secret

kubectl get secret -n kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager -o jsonpath={.data.'alertmanager\.yaml'} | base64 -d
global:
  resolve_timeout: 5m
  http_config: {}
  smtp_hello: localhost
  smtp_require_tls: true
route:
receiver: "null"
group_by:
- job
routes:
- receiver: "null"
match:
alertname: test
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: "null"
templates:
- /etc/alertmanager/config/*.tmplate

Alertmanager-generated secret:

kubectl get secret -n kubeaddons alertmanager-prometheus-kubeaddons-prom-alertmanager-generated -o jsonpath={.data.'alertmanager\.yaml'} | base64 -d
global:
  resolve_timeout: 5m
  http_config:
    follow_redirects: true
  smtp_hello: localhost
  smtp_require_tls: true
route:
receiver: "null"
group_by:
- job
routes:
- receiver: "null"
match:
alertname: test
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receivers:
- name: "null"
templates:
- /etc/alertmanager/config/*.tmplate

Solution

To allow for the Prometheus operator to update the configuration of the Alertmanager secret, you will need to edit the Alertmanager object, prometheus-kubeaddons-prom-alertmanager, using the below command:

kubectl edit -n kubeaddons alertmanagers.monitoring.coreos.com prometheus-kubeaddons-prom-alertmanager

Once you are inside of the editor you simply need to remove the .spec.alertmanagerConfigNamespaceSelector: {} and the .spec.alertmanagerConfigSelector: {} entries from the object:

apiVersion: monitoring.coreos.com/v1 
kind: Alertmanager 
metadata: 
annotations: 
meta.helm.sh/release-name: prometheus-kubeaddons 
meta.helm.sh/release-namespace: kubeaddons 
labels: 
app: prometheus-operator-alertmanager 
heritage: Helm 
release: prometheus-kubeaddons 
name: prometheus-kubeaddons-prom-alertmanager 
namespace: kubeaddons 
spec: 
alertmanagerConfigNamespaceSelector: {} 
alertmanagerConfigSelector: {} 
externalUrl: http://prometheus-kubeaddons-prom-alertmanager.kubeaddons:9093 
image: quay.io/prometheus/alertmanager:v0.21.0 
....

Once the selectors have been removed, the Prometheus operator will reconcile the state of both secrets and your most recent configuration will be deployed. You can validate the fix by ensuring that the contents of both secrets is identical.