What is use of "spark.streaming.blockInterval" in Spark Streaming DirectAPI

1.7k Views Asked by At

I want to understand, What role "spark.streaming.blockInterval" plays in Spark Streaming DirectAPI, as per my understanding "spark.streaming.blockInterval" is used for calculating partitions i.e. #partitions = (receivers x* batchInterval) /blockInterval, but in DirectAPI spark streaming partitions is equal to no. of kafka partitions.

How "spark.streaming.blockInterval" is used in DirectAPI ?

1

There are 1 best solutions below

2
On BEST ANSWER

spark.streaming.blockInterval :

Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark.

And KafkaUtils.createDirectStream() do not use receiver.

With directStream, Spark Streaming will create as many RDD partitions as there are Kafka partitions to consume