What is oozie equivalent for Spark?

546 Views Asked by At

We have very complex pipelines which we need to compose and schedule. I see that Hadoop ecosystem has Oozie for this. What are the choices for Spark based jobs when I am running Spark on Mesos or Standalone and doesn't have a Hadoop cluster?

2

There are 2 best solutions below

0
srinath_perera On BEST ANSWER

Unlike with Hadoop, it is pretty easy to chains things with Spark. So writing a Spark Scala script might be enough. My first recommendation is tying that.

If you like to keep it SQL like, you can try SparkSQL.

If you have a really complex flow, it is worth looking at Google data flow https://github.com/GoogleCloudPlatform/DataflowJavaSDK.

0
Rakesh On

Oozie can be used in case of Yarn, for spark there is no built in scheduler available, So you are free to choose any scheduler which works in the cluster mode.

For Mesos I feel Chronos would be the right choice, more info on Chronos