KStream - KTable Join - Patterns to Update Already Enriched Records when Reference Data on KTable Side Gets an Update

907 Views Asked by At

Question on the KStream - KTable joins. Usually this kind of join is used for data enrichment purposes where the KTable provides reference data.

So the question is, when the KTable record gets an update, how do we go about updating the older records that we already processed, enriched and probably stored in some data store?

Are there any patterns that we can follow?

(Please assume KTable - KTable wouldn’t be an option as KStream side would emit large volume of changes)

1

There are 1 best solutions below

3
On

I tend to think of such joins as enriching the stream of data. In that view, records which have come through the join before an update to the KTable are "correct" at the time.

I can see two options to consider:

First, as a Kafka Streams option, would a KStream-KStream join work? It sounds like that's the processing semantics that you'd like. (Incidentally, I really like the docs for showing clear examples of when records are and are not emitted: https://kafka.apache.org/31/documentation/streams/developer-guide/dsl-api.html#kstream-kstream-join)

Second, since it sounds like you may be persisting the streaming data, in this case, it may make sense to do query time enrichment. Creating a view/join over the two tables in the data store may provide a sane alternative to reprocessing data in the database.