Difference between running spark application as standalone vs spark submit / spark launcher?

528 Views Asked by At

I am exploring different options to package spark application and i am confused what is the best mode and what are the differences between the following modes?

  1. Submit spark application's jar to spark-submit
  2. Construct a fat jar out of spark gradle project and run the jar as stand alone java application.

I have tried both the ways , but my requirement is to package the spark application inside docker container , running fat jar looks easy for me but as am a newbie i don't have any idea about the restrictions that i may face if i go with fat jar approach(leaving aside fat jar may grow in size)

Can you please let us know your inputs

Is it possible to setup spark cluster including driver and executors programatically ?

val conf = new SparkConf()
conf.setMaster("local")
conf.set("deploy-mode", "client")
conf.set("spark.executor.instances", "2")
conf.set("spark.driver.bindAddress", "127.0.0.1")
conf.setAppName("local-spark-kafka-consumer")
val sparkSession = SparkSession
  .builder()
  .master("local[*]")
  .config(conf)
  .appName("Spark SQL data sources example")
  .getOrCreate()

val sc = sparkSession.sparkContext

val ssc = new StreamingContext(sparkSession.sparkContext, Seconds(5))
val kafkaParams = Map[String, Object](
    "bootstrap.servers" -> "localhost:9092,localhost:9093",
    "key.deserializer" -> classOf[LongDeserializer],
    "value.deserializer" -> classOf[StringDeserializer],
    "group.id" -> "consumerGroup10",
    "auto.offset.reset" -> "earliest",
    "max.poll.records" -> "1",
    "enable.auto.commit" -> (false: java.lang.Boolean))

val topics = Array("topic1")
val stream = KafkaUtils.createDirectStream[String, String](...)
ssc.start()
ssc.awaitTermination()
} catch {
  case e: Exception => println(e)
}
1

There are 1 best solutions below

0
On

Using fat jars for deploying spark jobs is an old and even ancient practice. You can do this, trust me :) Just be careful about what you're writing inside it.