Does Kafka consumer reads the message from active segment in the partition?

745 Views Asked by At

Let us say I have a partition (partition-0) with 4 segments that are committed and are eligible for compaction. So all these segments will not have any duplicate data since the compaction is done on all the 4 segments.

Now, there is an active segment which is still not closed. Meanwhile, if the consumer starts reading the data from the partition-0, does it also read the messages from active segment?

Note: My goal is to not provide duplicate data to the consumer for a particular key.

1

There are 1 best solutions below

0
On BEST ANSWER

Your concerns are valid as the Consumer will also read the messages from the active segment. Log compaction does not guarantee that you have exactly one value for a particular key, but rather at least one.

Here is how Log Compaction is introduced in the documentation:

Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.

However, you can try to get the compaction running more frequently to have your active and non-compated segment as small as possible. This, however, comes at a cost as running the compaction log cleaner takes up ressources.

There are a lot of configurations at topic level that are related to the log compaction. Here are the most important and all details can be looked-up here:

  • delete.retention.ms
  • max.compaction.lag.ms
  • min.cleanable.dirty.ratio
  • min.compaction.lag.ms
  • segment.bytes

However, I am quite convinced that you will not be able to guarantee that your consumer is never getting any duplicates with a log compacted topic.