Dask cluster is not starting up

683 Views Asked by At

I am trying to start a dask cluster but it says the below error:

Timed out trying to connect to 'tcp://100.100.160.25:2323' after 10 s:
Timed out trying to connect to 'tcp://100.100.160.25:2323' after 10 s: 
connect() didn't finish in time
1

There are 1 best solutions below

0
On

I experienced something similar building a temporary ECS/Fargate cluster via dask-cloudprovider. The answer ultimately fell into the bucket of network architecture. Here are some recommendations:

  1. Make sure you have network firewall rules for whatever IAM roles you have set up. This is a "Security Group" in AWS, but not positive about other platforms.
  2. Assure your network routing tables are correctly set up for your internet gateways and are allowing ingress and egress for your nodes... this is particularly insecure if not set up properly in a private subnet. If you are trying to run in a private subnet, then definitely try to identify whether the NAT gateway is properly setup, as well as any load balancers you may have..
  3. I see that your system is looking on ports 2323... dask usually looks for 8787 by default as far as I know, I'd look into that if you're unsure.

This problem is pretty hard to nail down, so I'd recommend a fair amount of trail-and-error. Check logs on each worker and scheduler and try to garner other hints to what can be causing the issue.