Problem
After a network disruption, you may notice errors similar to the below in your Istio-sidecar pods:
warning envoy config StreamAggregatedResources gRPC config stream closed: 8, grpc: received message larger than max (55215921 vs. 4194304)
warning envoy config StreamAggregatedResources gRPC config stream closed: 8, grpc: received message larger than max (55215931 vs. 4194304)
This is a known issue with Istio, where message content is appended to the message rather than overwriting old data. This happens after a large number of errors are generated in a row; once the max GRPC message size is surpassed, the state will not recover on its own.
Solution
There are a few ways you can resolve this error if your Istio proxy containers are affected by it. The first is to restart any pods that are posting these errors, restarting the istio-sidecar container will not be sufficient in resolving these errors. The first is a rolling restart of all of your pods. In larger environments, this can be a challenging task; for non-production environments, you can use a script similar to the below:
#!/usr/bin/env bash
export IFS=$'\n'
for i in `istioctl proxy-status | awk '{print $1}' | egrep -iv "^NAME$"`
do
export POD=`echo -n ${i} | awk -F\. '{print $1}'`
export NAMESPACE=`echo -n ${i} | awk -F\. '{print $2}'`
echo "About to kill $POD in namespace $NAMESPACE"
kubectl delete pod -n $NAMESPACE $POD
echo "Waiting before next delete"
sleep 15
done
Please note that this script does not consider replicas, pod disruption budgets, or the availability of your application. For this reason, we recommend testing in lower environments or modifying the script to fit your specific environment needs.
Another way to do this would be to temporarily increase the GRPC message size above the current message size. You can do this by editing your cluster.yaml to contain the environment variable below. After adding this to your cluster.yaml, you will need to run './konvoy deploy addons -y' to apply the changes to the cluster:
istioOperator:
components:
pilot:
k8s:
env:
- name: ISTIO_GPRC_MAXRECVMSGSIZE
value: "<desired-value>"
Please note that when you configure this value, you will need to surround the value in quotes. If you do not, you will see similar logging to this in your Operator pod:
error klog k8s.io/client-go@v0.20.1/tools/cache/reflector.go:167:
Failed to watch *v1alpha1.IstioOperator:failed to list *v1alpha1.IstioOperator:
v1alpha1.IstioOperatorList.Items: []v1alpha1.IstioOperator: v1alpha1.IstioOperator.Status:
Spec: unmarshalerDecoder: json: cannot unmarshal number into Go value of type string, error found in #10 byte of ...|":"1.9.1"},"status":|..., bigger context ...|ocker.io/istio","profile":"default","tag":"1.9.1"},"status":{"componentStatus":{"Base":{"status":"HE|...
If this is not an option for your organization, you can also manually edit the istio resource to include this as well:
kubectl edit istiooperators.install.istio.io -n istio-system istio-default
...
spec:
components:
cni:
enabled: true
namespace: kube-system
ingressGateways:
- enabled: true
k8s:
hpaSpec:
minReplicas: 2
name: istio-ingressgateway
pilot:
k8s:
env:
- name: ISTIO_GPRC_MAXRECVMSGSIZE
value: "<your Value>"
...
After this, please monitor your istiod and istio-operator pods to validate the environment variable is applied. If you cannot make either of these changes, the final option would be to upgrade your cluster to DKP 2.2.x. This bug was fixed in Istio 1.9.3; DKP 2.2.x utilizes Istio 1.11.6.