Issue
Application logs in Grafana-logging is not up to date and when observing the logs of FluentBit, there error lines similar to:
kubectl logs -l=app.kubernetes.io/name=fluentbit -n kommander --max-log-requests=7 -f
[2023/01/04 07:43:17] [error] [net] TCP connection failed: logging-operator-logging-fluentd.kommander.svc.cluster.local:24240 (Connection refused) [2023/01/04 07:43:17] [error] [output:forward:forward.0] no upstream connections available [2023/01/04 07:43:17] [ warn] [engine] failed to flush chunk '1-1672818196.694387935.flb', retry in 6 seconds: task_id=2, input=tail.0 > output=forward.0 (out_id=0) [2023/01/04 07:43:17] [error] [net] TCP connection failed: logging-operator-logging-fluentd.kommander.svc.cluster.local:24240 (Connection refused)
Solution
First attempt to restart the fluentbit daemonset by running the command below:
kubectl rollout restart ds logging-operator-logging-fluentbit -n kommander
TLS errors
on fluentbit usually occurs if FluentD pod has been restarted somehow (ie. config changes), requiring fluentbit to be restarted manually. This has been fixed on DKP 2.2.3, 2.3.1 and 2.4.
If the issue is still occurring, please increase the memory resource/request limit of the FluentD deployment. Following this guide. After making any configuration changes please make sure to restart the FluentBit daemonset, for DKP 2.2 and below.
If increasing the resource of FluentD is not enough, this can be scaled as well.