Prometheus fails to evaluate apiserver_request_total default rule due to loading too many samples into memory

Overview/Background

When running prometheus-operator version 8.12.14 or newer, you may observe that the code_verb:apiserver_request_total:increase30d rule fails to evaluate due to it loading too many samples into memory. In this scenario, you will observe log messages such as:

level=warn ts=2020-07-22T12:11:22.244Z caller=manager.go:534 component="rule manager" group=kube-apiserver.rules msg="Evaluating rule failed" rule="record: code_verb:apiserver_request_total:increase30d\nexpr: sum by(code, verb) (increase(apiserver_request_total{job=\"apiserver\"}[30d]))\n" err="query processing would load too many samples into memory in query execution"

Solution

As a temporary workaround, we suggest increasing the QuerySpec maxSamples to a suitable value via your Konvoy cluster's cluster.yaml:

    - name: prometheus
      enabled: true
      values: |
        prometheus:
          prometheusSpec:
            query:
              maxSamples:

After making the change in your cluster.yaml, you can update your cluster's addons by executing:

konvoy deploy addons -y

To permanently resolve the issue, you should upgrade to the latest version of Konvoy (and the associated Kubernetes Base Addons).