We would like to pack as many pods into each nodes in our cluster as much as possible do decrease the amount of nodes we have on some of our environments. I saw https://github.com/kubernetes-sigs/descheduler HighNodeUtilization strategy which seems to fit the bill for what we need. However, it seems the cluster needs to have the scoring strategy MostAllocated to work with this.
I believe that the kube-scheduler in EKS in inaccessible to be configured. How do I then configure the MostAllocated scoring strategy?
Better yet, how do I configure this automated packing of pods in as little nodes as possible in a cluster without the use of Descheduler?
Tried deploying the descheduler as is without the MostAllocated scoring strategy configured. Obviously did not provide the results expected.
Many of my digging online led to having to create a custom-scheduler, but I have found little/unclear resources to be able to do so.
Eks does not provide the ability to override the default scheduler configuration, which means that actually configuring the
default-scheduler
profile with theMostAllocated
scoring strategy is not an option. However, you may run your own scheduler alongside the default scheduler, and this one may be configured how you like. Once you create a custom scheduler, you can override that scheduler's configuration with theMostAllocated
scoring strategy and then instruct your workloads to use that scheduler.In order to run multiple schedulers, you have to set up several Kubernetes Objects. These objects are documented in the guide linked above:
The deployment will use the standard
kube-scheduler
image provided by Google, unless you'd like to create your own. I wouldn't recommend it.Major Note: Ensure your version of the kube-scheduler is the same version as the control plane. This will not work otherwise.
In addition, ensure that your version of the
kube-scheduler
is compatible with the version of the configuration objects that you use to configure the scheduler profile.v1beta2
is safe forv1.22.x
->v1.24.x
but onlyv1beta3
orv1
is safe forv.1.25+
.For example, here's a working version of a deployment manifest and config map that are used to create a custom scheduler compatible with
k8s
v.1.22.x
. Note you'll still have to create the other objects for this to work: