If you are attempting to add nodes to a Konvoy cluster that already has 200 nodes, you may encounter a limitation of Calico. You may receive an error that resembles the following:
Error: spec.nodePools: Invalid value: ... Calico route reflector pool (node pool with label 'dedicated:route-reflector') with at least 2 nodes must be specified when there are more than 200 nodes
By default, Calico runs a node-to-node mesh, but this will not work if you go beyond 200 nodes. To get around this limitation, you need to configure nodes that are dedicated route reflectors. More detail about this Calico limitation can be found here:
To provision route reflector nodes in Konvoy, you must set them up as a separate nodePool. This will require editing two sections of your cluster.yaml.
First, declare the necessary labels and taints of the route-reflector nodePool in the ClusterConfiguration section:
kind: ClusterConfiguration
apiVersion: konvoy.mesosphere.io/v1beta2
spec:
nodePools:
- name: route-reflector
labels:
- key: dedicated
value: route-reflector
taints:
- key: dedicated
value: route-reflector
effect: NoExecute
Next, configure the nodePool itself in the ClusterProvisioner section. A "count" of at least 2 route-reflector nodes is required, but we recommend at least 3. This example contains the syntax for AWS, but you may need to change it based on your environment.
kind: ClusterProvisioner
spec:
nodePools:
- count: 3
machine:
imagefsVolumeEnabled: true
imagefsVolumeSize: 160
imagefsVolumeType: gp2
rootVolumeSize: 80
rootVolumeType: gp2
type: m4.4xlarge
name: route-reflector
Once this change is made, you can implement the change by running "./konvoy provision" then "./konvoy deploy".
Please be aware that there is a known issue where Calico's readiness timeout values are configured to be too low by default in some versions of Konvoy.
Once the deployment is pushed to your cluster, make sure that you manually adjust Calico's timeout values to avoid any Calico downtime, as this value may have been overwritten by the deploy operation. More information on this issue can be found here: