Problem with incremental data load strategy

312 Views Asked by Abhishek At 30 May 2023 at 15:02

I am building a pipeline which reads data incrementally from a source table based on the timestamp value, let's say the timestamp column name is warehouse_created_time. The value of this warehouse_created_time is assigned in the source datawarehouse as part of a batch load. The incremental pipeline needs to run multiple times a day.

The problem I am trying to solve is, there can be a situation, when the records of a long running batch will have less warehouse_created_time (as timestamp will be reserved at the start of the batch) than a short batch submitted later. And if the incremental process ran while long batch was still in progress then it will take the committed records of the short job, and therefore the target table will have records which have higher warehouse_created_time than the batch which was still running. The records of the longer batch will never be brought to the target database.

How to handle this situation, or better prevent it from happening ?

Original Q&A

Problem with incremental data load strategy

There are 0 best solutions below

Related Questions in DATABASE

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in DATA-WAREHOUSE

Related Questions in INCREMENTAL

Trending Questions

Popular # Hahtags

Popular Questions