Prometheus fails to evaluate apiserver_request_total default rule due to loading too many samples into memory
Overview/Background
When running prometheus-operator version 8.12.14 or newer, you may observe that the code_verb:apiserver_request_total:increase30d rule fails to evaluate due to it loading too many samples into memory. In this scenario, you will observe log messages such as:
level=warn ts=2020-07-22T12:11:22.244Z caller=manager.go:534 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: code_verb:apiserver_request_total:increase30d\nexpr: sum by(code, verb) (increase(apiserver_request_total{job=\"apiserver\"}[30d]))\n" err="query processing would load too many samples into memory in query execution"
Solution
As a temporary workaround, we suggest increasing the QuerySpec maxSamples to a suitable value via your Konvoy cluster's cluster.yaml:
- name: prometheus
enabled: true
values: |
prometheus:
prometheusSpec:
query:
maxSamples:
After making the change in your cluster.yaml, you can update your cluster's addons by executing:
konvoy deploy addons -y
To permanently resolve the issue, you should upgrade to the latest version of Konvoy (and the associated Kubernetes Base Addons).