Kafka Streams exactly_once_v2 produce duplicates after application restart

221 Views Asked by At

Please, we have kafka streams application with processing_quarantee=exactly_once_v2.

Kafka version: 3.2.0

Kafka Streams version: 3.0.1

Confluent version 7.0.1

Another configurations necessary for exactly once processing are also set:

Producer:

enable.idempotent=true

acks=all

Consumer:

isolation.level=read_committed

But when my application crashed and I restart it, duplicates are produced to target topic.

I notice that after restart new transactional.id is created so I try to configure transactional.id in code (lets say transactional.id=my-app) to keep same transactional ID across application restarts. But then I found this in log:

2023-01-11 19:30:42.791  WARN 31299 --- [           main] org.apache.kafka.streams.StreamsConfig   : Unexpected user-specified producer config: transactional.id found. processing.guarantee is set to exactly_once_v2. Hence, User setting (my-app) will be ignored and the Streams default setting (<appId>-<generatedSuffix>) will be used

Did I miss something?

I though the problem is that after every restart new transactional.id is created so we cannot start where we end before application failed. But then why exactly_once_v2 doesn't allow to change transactional.id.

Has anyone experience with exactly_once_v2?

1

There are 1 best solutions below

0
On

Kafka Streams manages transactional.id internally, and for eos_v2 it's expected that new transactional.ids are created (cf KIP-447 for details).

I am not aware of a correctness issues with regard to eos_v2 in 3.0.1 from top of my head. Did you check Kafka Jiras?

But when my application crashed and I restart it, duplicates are produced to target topic.

Can you elaborate a little bit more? How do you verify this? Is the downstream consumer also configured with "read_committed" mode? Was the any failure in the downstream consumer (note, if the downstream consumer fails, it might re-read the same committed message twice -- neither eos_v2 nor "read_committed" mode can prevent this).