We want to use GlobalKTable in Kafka streams application. Input topics(KTable/KStream) have N partitions and a GlobalKTable will be used as a dictionary in the stream application.

Does the input topic for the GlobalKTable must have the same number of partitions as other input topics (which are sources of KTable/KStream)?

As I understand, the answer is NO(it is not limited and the topic may also have M partitions where N > M), because GlobalKTable is fully loaded in each instance of the stream application and the co-partitioning is not required during KStream join operation. But I need confirmation from the experts!

Thank you!

2

There are 2 best solutions below

0
On BEST ANSWER

No, The number of partitions for topics for KStream and GlobalTable (that are join) can differ.

From Kafka Streams developer guide

At a high-level, KStream-GlobalKTable joins are very similar to KStream-KTable joins. However, global tables provide you with much more flexibility at the some expense when compared to partitioned tables:

  • They do not require data co-partitioning.

More details can be found here:

Global Table join

Join co-partitioning requirements

3
On

More accurately:

Why is data co-partitioning required? Because KStream-KStream, KTable-KTable, and KStream-KTable joins are performed based on the keys of records (e.g., leftRecord.key == rightRecord.key), it is required that the input streams/tables of a join are co-partitioned by key.

The only exception are KStream-GlobalKTable joins. Here, co-partitioning is it not required because all partitions of the GlobalKTable‘s underlying changelog stream are made available to each KafkaStreams instance, i.e. each instance has a full copy of the changelog stream. Further, a KeyValueMapper allows for non-key based joins from the KStream to the GlobalKTable.