Kafka get to know when related messages are consumed

11.4k Views Asked by At

is there any way, in Kafka, to produce a message once several related messages have been consumed ? (without having to manually control it at the application code...)

The use case would be to pick a huge file, split it into several chunks, publish a message for each of these chunks in a topic, and once all these messages are consumed produce another message notifying the result on another topic.

We can do it with a database, or REDIS, to control the state but I wonder if there's any higher level approach leveraging only Kafka ecosystem.

2

There are 2 best solutions below

2
On

You can use ConsumerGroupCommand to check if certain consumer group has finished processing all messages in a particular topic:

  1. $ kafka-consumer-groups --bootstrap-server broker_host:port --describe --group chunk_consumer

OR

  1. $ kafka-run-class kafka.admin.ConsumerGroupCommand ...

Zero lag for every partition will indicate that the messages have been consumed successfully, and offsets committed by the consumer.

Alternatively, you can choose to subscribe to the __consumer_offsets topic and process messages from it yourself, but using ConsumerGroupCommand seems like a more straightforward solution.

2
On

Approach can be as follow:

  1. After consuming each chunk application should produce message with status (Consumed, and chunk number)
  2. Second application (Kafka Streams once) should aggregate result and, when process messages with all chunks produce final message, that file is processed.