Apache Flink enrich actual stream with reference data

63 Views Asked by hello At 14 March 2024 at 11:33

I have a kafka stream of actual data that needs to be enriched using reference data.

Reference data can be fetched using API call. And any update on reference data comes on another kafka topic.

I want to fetch the whole reference data using API on startup and store in flink state. Also update the state based on the kafka reference data update.

How can the state be shared across task slots in multiple task managers so that I can connect the actual data stream and reference data stream to do the enrichment? This is to make sure all parallel operators have access to whole reference data and same state.

Is there any better way?

Original Q&A

There are 1 best solutions below

kkrugler On 14 March 2024 at 14:43

Flink's broadcast state pattern support should work for you, as far as ensuring that all enrichment data is available to all sub-tasks.

For starting all with all API-based data, and then adding updates from Kafka, I would use Flink's hybrid source support. In your main() method, download enrichment data from the API and save in files, then use a hybrid source that first reads from the file system, and switches to Kafka when that's complete.

Apache Flink enrich actual stream with reference data

There are 1 best solutions below

Related Questions in APACHE-FLINK

Related Questions in FLINK-STATE

Trending Questions

Popular # Hahtags

Popular Questions