In Kafka Stream library, I want to know difference between KTable and GlobalKTable.
Also in KStream class, there are two methods leftJoin()
and outerJoin()
. What is the difference between these two methods also?
I read KStream.leftJoin, but did not manage to find an exact difference.
KTable VS GlobalKTable
A
KTable
shardes the data between all running Kafka Streams instances, while aGlobalKTable
has a full copy of all data on each instance. The disadvantage ofGlobalKTable
is that it obviously needs more memory. The advantage is, that you can do a KStream-GlobalKTable join with a non-key attribute from the stream. For a KStream-KTable join and a non-key stream attribute for the join is only possible by extracting the join attribute and set it as the key before doing the join -- this will result in a repartitioning step of the stream before the join can be computed.Note though, that there is also a semantical difference: For stream-table join, Kafka Stream align record processing ordered based on record timestamps. Thus, the update to the table are aligned with the records of you stream. For
GlobalKTable
, there is no time synchronization and thus update toGlobalKTable
and completely decoupled from the processing of the stream records (thus, you get weaker semantics).For further details, see KIP-99: Add Global Tables to Kafka Streams.
leftJoin() VS outerJoin()
About left and outer joins: it's like in a database a left-outer and full-outer join, respectively.
For a left outer join, you might "lose" data of your right input stream in case there is no match for the join in the left-hand side.
For a (full)outer join, no data will be dropped and each input record of both streams will be in the result stream.