Is there any possible to achieve dynamic batch size in Spark Streaming?

408 Views Asked by At

In order to reduce the difficulty of the code, I allow to restart the Spark Streaming system to use new batch size, but need to keep the previous progress (allowing to lose the batch being processed).

If I use checkpoint in Spark Streaming, it can't change all configurations when the application restarts.

So I want to implement this function by modifying the source code, but I don't know where to start. Hope to give some guidance and tell me the difficulty.

1

There are 1 best solutions below

0
On

Since you are talking about the batch size I'm assuming that you are asking about spark streaming and not structured streaming.

There is a way to programmatically set the value for your batch interval, refer this link for documentation.

the constructor of StreamingContext accepts the duration class's object which defines the batch interval.

you can pass batch interval size by hardcoding it in the code, which will require you to build the jar file every time when you need to change batch interval, instead, you can bring it from a config file, this way you don't need to build the code every time.

Note: You have to set this property in the applications's config file, not in the spark's config file.

You can change the config for batch interval and restart the application, this will not cause any problems with respect to checkpointing.

val sparkConf: SparkConf = new SparkConf()
  .setAppName("app-name")
  .setMaster("app-master")

val ssc: StreamingContext = new StreamingContext(sparkConf, Seconds(config.getInt("batch-interval")))

Cheers!!