How can I create an incremental Azure Data Factory pipeline using metadata table?

249 Views Asked by At

How can we incrementally load files from azure blob storage to sql server or synapse by azure data factory using metadata table or lookup table as we can't query on files as we do on sql tables .

In incrementally load files data may be incresing or have new or updated records . I want complete flow as we do it for sql by using metadata table or lookup table. Thanks

1

There are 1 best solutions below

0
On

In copy activity, you cannot query the file to get the delta or incremental data. Therefore, you can use dataflow to filter the delta rows from source and load it in the sink table. Below are the detailed steps.

  • Lookup activity is taken, and dataset has list of filenames and last copied date (the date at which the file is copied previously). enter image description here

  • Foreach activity is taken and Items in settings is given as @activity('Lookup1').output.value.

enter image description here

  • Inside foreach activity, dataflow activity is taken.
  • In dataflow activity, Source transformation is taken and in source, filename is given dynamically in source dataset using dataset parameter.

gif1

  • A dataflow parameter name date of string type is created. enter image description here

  • Then filter activity is taken in dataflow and condition is given to filter the records which are changed after the previous load.

enter image description here

  • Then Sink transformation is taken and table name for sink dataset is given dynamically. (Here I am giving the same sink table name as source file name).
  • Then you can give the value for dataflow parameter and source and sink dataset parameters in dataflow activity. gif1

Then you can update the date value in the lookup table using script activity inside for-each activity and this script activity should be taken next to dataflow activity.