I have a master on the cloud with 3 masters and 10 slaves.
All the slaves appear active but 0 resources alocated in the Mesos Master UI:
In the home page I can see 10 activated Agents but 9 are unreachable:
The jobs I try to run on the cluster get stuck on RUNNING state for ever.
Does Spark need to be up and running (run start-slave.sh on every slave) or mesos does it? What could be wrong?
There are no ports blocked on the machines
Edit:
It looks like the machine that launches the application is able to connect to the Master:
I0902 18:01:31.472944 14997 zookeeper.cpp:262] A new leading master (UPID=master@X:5000) is detected
I0902 18:01:31.473047 14993 sched.cpp:343] New master detected at master@X:5000
I0902 18:01:31.473348 14993 sched.cpp:363] No credentials provided. Attempting to register without authentication
I0902 18:01:31.475391 14994 sched.cpp:751] Framework registered with 9984df9d-0efb-4f83-bf6a-0cecb19b1a39-0002
Also it tries to start a task but it gets stucked, this behavior is cyclic:



Two sol for this problem :
install hadoop client in all mesos slaves
put spark-x.y.z.tar.gz in /path/in/os/
otherwise : in mesos ui -> agent tab -> sandbox -> stderr (check error detail )