Question on the KStream - KTable
joins. Usually this kind of join is used for data enrichment purposes where the KTable provides reference data.
So the question is, when the KTable record gets an update, how do we go about updating the older records that we already processed, enriched and probably stored in some data store?
Are there any patterns that we can follow?
(Please assume KTable - KTable
wouldn’t be an option as KStream
side would emit large volume of changes)
I tend to think of such joins as enriching the stream of data. In that view, records which have come through the join before an update to the
KTable
are "correct" at the time.I can see two options to consider:
First, as a Kafka Streams option, would a
KStream-KStream
join work? It sounds like that's the processing semantics that you'd like. (Incidentally, I really like the docs for showing clear examples of when records are and are not emitted: https://kafka.apache.org/31/documentation/streams/developer-guide/dsl-api.html#kstream-kstream-join)Second, since it sounds like you may be persisting the streaming data, in this case, it may make sense to do query time enrichment. Creating a view/join over the two tables in the data store may provide a sane alternative to reprocessing data in the database.