Offsets for Kafka Direct Approach in Spark 1.3.1

385 Views Asked by joebuild At 03 June 2025 at 10:09

I am implementing the 'direct' approach for kafka streaming in Spark 1.3.1 https://spark.apache.org/docs/1.3.1/streaming-kafka-integration.html As I understand it, there are two ways that the 'auto.offset.reset' can be set: "smallest", and "largest". The behavior that I am observing (and let me know if this is to be expected) is that the "largest" will start fresh and receive any new incoming data - while the "smallest" will start from 0 and read to the end, but won't receive any new incoming data. Clearly it would be preferable to be able to start from the beginning and also receive new incoming data. I did see the access (in the docs) to the offsets that each batch is consuming, but I'm not sure how that could be helpful here. Thanks.

Original Q&A

There are 1 best solutions below

joebuild On 11 June 2015 at 18:03 BEST ANSWER

It looks like I was mistaken - the 'smallest' actually does continue to read from the end for new/incoming data.

Offsets for Kafka Direct Approach in Spark 1.3.1

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARK-STREAMING

Related Questions in APACHE-KAFKA

Trending Questions

Popular # Hahtags

Popular Questions