Load Parquet Files from ADLS Gen2 using ADF

1.1k Views Asked by At

I would like to setup ADF pipeline in such a way that I need to load all the Parquet files hosted for 2+ years on ADLS Gen2 with a hierarchy of Year -> Month -> Day -> Hour - > Min. Over the period, we did have some file structure changes with a variance of 2-3 columns. I would like to pull all the common columns and load entire data in a sql table. Can someone please point me to the resources which could help with my requirement.

Thank you!

1

There are 1 best solutions below

2
On

In the Azure data factory pipeline,

  1. Use the Get Metadata activity to get the list of parquet files.
  2. Pass the child items to the ForEach activity to loop each current item.
  3. Add the If condition activity inside ForEach activity to check if the date from the file is greater than the current time minus 2.
  4. Add a copy data activity in True activities to copy data from source to sink.

You can refer to this document to copy data to the SQL table.