I have partitioned parquet files in Azure Blob that I am copying to Azure SQL. How do I get the partition name into the SQL table?
I've figured out how to get the full file path into the SQL table by adding an Additional Column in the source data section of the Copy Activity (image 1 & 2), but I'm trying to figure out how to regex the full file path down to just the partition name (202105).
In the data preview for the source data in the Copy Activity, it shows the time_period column with just the partition name (image 3). But when it shows up in SQL it is NULL for all rows (or it's the full file path, depending on if I added Additional Columns in the source data section of the Copy Activity).
I've tried changing the data type for time_period to an INT in Azure SQL. I've tried parsing the $$FILEPATH, but nothing I've tried has worked.
I'm basically starting from scratch as I'm sure there's a better. Extra background here and possibly here.
Similar to this



As explained here in MS doc, you can utilize
enablePartitionDiscoveryfeature.Source: partitioned files:
Source Dataset:
Just mentioned the container name and leave the directory and file fields empty. We shall filter them using
WildCard pathsin Copy Activity.Configure source in
Copy Activitywith respect to your files path:Note: you can skip the step 4 i.e. additional column with
$$FILEPATH, just shown for reference. You can drop this bit as you already get the ready column usingenablePartitionDiscovery.For a single folder to be picked, you will set as below.
Wildcard paths:
sink / columnparts / time_period=202105 / *.parquetFor multiple folders
time_period=202105,time_period=202106..... as seen in previous sinp, set as below.**will take the place of any folder in the parent foldercolumnpartsWildcard paths:
sink / columnparts / ** / *.parquetPartition root Path: This should point to the parent folder where all the partitioned folders rest.
In my example:
sink/columnpartspartition root path must be provided when you enable partition discovery.
Sink: Optional update existing table or just create a new one.
View from SQL DB:
time_periodcolumn holds the value202105If you see this error:
You have a mapping that is not updated! In the mapping section, you can
clearorresetschema andImport schemaagain just to be sure.In my case it was additional column
file_path--OR--
$$FILEPATHis a reserved variable, you cannot use it in expression builder or in functions to manipulate.Instead if you can incorporate a step after you copy to SQL DB i.e. use a stored procedure as below.
Where column
pathholds the full file path received from$$FILEPATHas you have managed already.StoreParquetTestis the table created in SQLsinkNow you can use the stored procedure activity in Pipeline after Copy Activity.