Pre-populate a bronze delta table from a silver table using a batch job, then stream to it from the same table

72 Views Asked by user961826 At 15 February 2023 at 01:57

I have a pipeline like this:

kafka->bronze->silver

The bronze and silver tables are Delta Tables. I'm streaming from bronze to silver using regular spark structured-streaming.

I changed the silver schema, so I want to reload from the bronze into silver using the new schema. Unfortunately, the reload is taking forever, and I'm wondering if I can load the data more quickly using a batch job, and then turn the stream back on.

I am concerned that the checkpoint will tell the stream from bronze->silver to pick up where it left off and it will write a bunch of duplicates that I will then need to remove. Is there a way I can advance the checkpoint with the batch load, or play other tricks?

Will that be faster than just letting the stream run? I get the feeling that it is spending a lot of resources writing microbatch transactions.

Any suggestions greatly appreciated!!!

Original Q&A

Pre-populate a bronze delta table from a silver table using a batch job, then stream to it from the same table

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARK-STRUCTURED-STREAMING

Related Questions in DELTA-LAKE

Related Questions in SPARK-CHECKPOINT

Trending Questions

Popular # Hahtags

Popular Questions