Mirrored repartition topics keep increasing in size

435 Views Asked by At

We are using MirrorMaker to backup topics. We noticed that repartition topics created by Kafka Streams applications seem to keep increasing in size in the target cluster compared to the source cluster. This makes sense according to the documentation of org.apache.kafka.streams.kstream.KStream#repartition():

Similar to auto-repartitioning, the topic will be created with infinite retention time and data will be automatically purged by Kafka Streams.

In other words, since we do not have the Kafka Streams applications running in the target cluster, the automatic purge is not happening.

Are we understanding this correctly? How do we ensure that we back up the source cluster without losing data and without the target cluster size increasing beyond the source cluster size?

Edit December 1, 2021: We still have this issue. Is MirrorMaker even the right choice of tool? Do we need to consider Replicator or Cluster Linking?

Edit August 30, 2022: We have come to the conclusion that we need to exclude repartition topics when mirroring. If/when Kafka Streams applications are started with the mirrored data, the repartition topics should be recreated and used as necessary as new input data from source topics is processed. Please provide any comments and thoughts. Thank you.

1

There are 1 best solutions below

3
On

Upgrading to Kafka 3.1.1 (from 2.7) resulted in repartition topics being created with other settings in the target cluster, e.g., cleanup.policy=delete,segment.bytes=52428800,retention.ms=-1,message.format.version=3.0-IV1,max.message.bytes=2000024 instead of cleanup.policy=compact,segment.bytes=104857600,message.format.version=3.0-IV1,min.cleanable.dirty.ratio=0.25. We had to delete the topics to have them recreated with the new settings.