Overview
Based on your use case and workloads, Loki may require configuration changes in your environments.
If workloads are sending large payloads to Loki, you may see errors in the log for the grafana-loki-loki-distributed-gateway
pod, similar to the following:
[error] 9#9: *1956 client intended to send too large body: 37521 bytes, client: 192.168.2.78, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "grafana-loki-loki-distributed-gateway.kommander.svc.cluster.local"
This error tells you what increased payload sizes you are now dealing with; monitor the logs for a while to give you an idea of what new maximum size you want to be using.
Also, in some situations, the nginx instance in the logging components may returning a 413 error with the following HTML:
<html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx/1.19.10</center>
</body>
</html>
You may see this error in logs for fluent-bit, along with frequent OOM kills for fluent-bit pods running on the cluster. These OOM kills occur because fluent-bit continues to buffer logs that it cannot ship to Loki and eventually it runs out of buffer space and crashes.
Solution
Reconfigure the loki gateway - nginx:
Note: This part of the process only applies to Kommander Management clusters. If you are trying to apply this change on a managed/attached cluster, please follow the following article:
https://support.d2iq.com/hc/en-us/articles/8665246939668
The first thing to do is to actually get the data into the Loki environment, and this is done by adjusting the nginx portion of the loki configmap to override the default max body size of 1M. Near the top of your kommander.yaml file, replace:
grafana-loki: null
with
grafana-loki:
values: |
gateway:
nginxConfig:
httpSnippet: |-
client_max_body_size 50M;
serverSnippet: |-
client_max_body_size 50M;
You can now re/deploy Kommander.
This change will allow nginx to receive larger data payloads and pass that into the Loki environment, but Loki still needs to be configured to be able to pass this data around internally between its components, which it does via gRPC. The next part can only be configured once the cluster is running as the changes are not supported by the Loki Helm chart at present.
Reconfigure Loki components
To enable Loki to deal with larger gRPC packets, we need to add some options to the config map grafana-loki-loki-distributed
. Specifically the fields in the grpc_client_config
object and the grpc_server_max_[recv|send]_msg_size
fields in the server
object. Checking with the official Loki documentation, we should end up with something along the lines of the following to insert into your existing config map alongside the original contents:
server:
grpc_server_max_recv_msg_size: 52428800
grpc_server_max_send_msg_size: 52428800
query_scheduler:
grpc_client_config:
max_recv_msg_size:52428800
max_send_msg_size:52428800
frontend_worker:
grpc_client_config:
max_recv_msg_size:52428800
max_send_msg_size:52428800
ingester_client:
grpc_client_config:
max_recv_msg_size:52428800
max_send_msg_size:52428800
The value needs to be the same as the nginx value, which comes from your error messages above.
Once that is done, simply restart the Loki deployment:
kubectl -n kommander rollout restart Deployment/grafana-loki-loki-distributed-distributor
Conclusion
Once you have restarted Loki, you should be able to go through the same log checking process to check that it is now working as expected. This article gives you a good starting point for increasing the payload sizes for Loki, but there are other values that you may need to look at, such as those in the storage_config
object if you use a custom store, the grpc_server_max_concurrent_streams
value, and maybe even the rate limit values.