Running Hadoop/Storm tasks on Apache Marathon

1.9k Views Asked by At

I recently came across Apache Mesos and successfully deployed my Storm topology over Mesos.

I want to try running Storm topology/Hadoop jobs over Apache Marathon (had issues running Storm directly on Apache Mesos using mesos-storm framework).

I couldn't find any tutorial/article that could list steps how to launch a Hadoop/Spark tasks from Apache Marathon.

It would be great if anyone could provide any help or information on this topic (possibly a Json job definition for Marathon for launching storm/hadoop job).

Thanks a lot

2

There are 2 best solutions below

0
On

Marathon is intended for long-running services, so you could use it to start your JobTracker or Spark scheduler, but you're better off launching the actual batch jobs like Hadoop/Spark tasks on a batch framework like Chronos (https://github.com/airbnb/chronos). Marathon will restart tasks when the complete/fail, whereas Chronos (a distributed cron with dependencies) lets you set up scheduled jobs and complex workflows.

While a little outdated, the following tutorial gives a good example.

http://mesosphere.com/docs/tutorials/etl-pipelines-with-chronos-and-hadoop/

0
On

Thanks for your reply, I went ahead and deployed a Storm-Docker cluster on Apache Mesos with Marathon. For service discovery I used HAProxy. This setup allows services (nimbus or zookeeper etc) to talk to each other with the help of ports, so for example adding multiple instances for a service is not a problem since the cluster will find them using the ports and loadbalance the requests between all the instances of a service. Following is the GitHub project which has the Marathon recipes and Docker images: https://github.com/obaidsalikeen/storm-marathon