Using the code bellow it is possible to create a dask kubernetes cluster in azure aks.
It uses a remote scheduler (dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})) and works perfectly.
To use virtual nodes, uncomment the line extra_pod_config=virtual_config (which follows this official example).
It doesn't work, with the following error:
ACI does not support providing args without specifying the command. Please supply both command and args to the pod spec.
This is tied to passing containers: args: [dask-scheduler]
Which containers: command: should I supply to fix this issue?
Thank you
import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
image = "daskdev/dask"
cluster = "aks-cluster1"
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
virtual_config = {
"nodeSelector": {
"kubernetes.io/role": "agent",
"beta.kubernetes.io/os": "linux",
"type": "virtual-kubelet",
},
"tolerations": [
{"key": "virtual-kubelet.io/provider", "operator": "Exists"},
],
}
pod_spec = make_pod_spec(
image=image,
# extra_pod_config=virtual_config,
memory_limit="2G",
memory_request="2G",
cpu_limit=1,
cpu_request=1,
threads_per_worker=1, # same as cpu
)
# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
pod_spec, auth=auth, deploy_mode="remote", scheduler_service_wait_timeout=180
)
client = Client(cluster)
the reason comes from this virtual kubelet protection: in the pod configuration, dask uses
argsto start a scheduler or worker, but nocommandis supplied.So I explicitly added the entrypoint command
command_entrypoint_explicitand it works: pods are created sucessfully.Second problem: network names resolution. workers fail to connect to the scheduler by network name:
tcp://{name}.{namespace}:{port}Although
tcp://{name}.{namespace}.svc.cluster.local:{port}works. I edited this indask_kubernetes.core.Scheduler.startand it works.Another option is the
virtual_configbellow. Code bellow is a complete solution.