Join data from 4 topics in broker using Kafka Streams when updates are not same in each of the topics

617 Views Asked by At

I am working on a requirement to process data ingested from a SQL Data store to Kafka Broker in 4 different topics corresponding to 4 different tables in the SQL Data Store. I am using Kafka Connect to ingest the data into the topics.

I now want to join the data from these topics and aggregate them and write them back to another topic. This topic will in turn be subscribed by a consumer to populate a NOSQL Data store which will be used to render the UI.

I know Kafka Streams can be used to join topics.

My query is, the data being ingested from SQL Data store tables may not always have data for all the 4 tables. Only 2 of the tables will have regular updates. One will get updated but not in the same frequency as the other 2. The remaining one is a static (sort of master table).

So, I am not sure how we can actually join them with Kafka Streams when the record counts will mismatch in topics.

Has anyone faced a similar issue . If so, can you please provide your thoughts/code snippets on the same.

1

There are 1 best solutions below

4
On BEST ANSWER

The number of rows don't matter at all... Why should it have any impact on the join result?

You can just read all 4 topics as a KTable each, and do the join. Finally, you apply an aggregation to the join-result KTable and write the final result to a topic. Something like this:

KTable t1 = builder.table("topic1");
KTable t2 = builder.table("topic2");
KTable t3 = builder.table("topic3");
KTable t4 = builder.table("topic4");

KTable joinResult = t1.join(t2, ...).join(t3, ...).join(t4, ...);

joinResult.groupByKey(...).aggregate(...).to("result-topic);