I think I understood that idempotence means "each message is written once" in the log, but when talking about EOS (Exactly-Once Delivery Semantic) also the consumer plays a role and enters in the end-to-end guarantees.
So, as stated here (for instance), both idempotence and transactions are needed and sufficient for "end-to-end exactly-once semantics".
However, Kafka doc on compression says:
Since data is stored in compressed format on the broker, valid fetch offsets are the compressed message boundaries. Hence, for compressed data, the consumed offset will be advanced one compressed message at a time. This has the side effect of possible duplicates in the event of a consumer failure.
Question:
- Even if the producer is idempotent and the consumer is transactional within Kafka (eg Kafka Streams), I may see duplicates since the offset is advanced on the compressed message boundaries. Consider processing a single message not at the boundary of the compressed message: the offset is not advanced and therefore I will see duplicates if the consumer fails. Correct?
It seems that enabling compression may nullify the EOS efforts. I see no mention of this in articles discussing EOS.