making a dynamic pipeline in azure data factory

312 Views Asked by At

i have a azure blob data storage with 8 .csv files in it. i want to get all of the files to auto create a table in the azure sql database, and if exists truncate the tables and fill them again when the file is renewed.

here is what i have till now

  1. get meta data activity - to get all the files from the storage.
  2. for each (for each item of the meta data activity)

what is in the for each?

  1. set variable (table name minus the extension .csv)
  2. lookup function (to get all the meta data from the database like delimiter and azure function)
  3. copy data activity to get all the files from the data blob to the sql data base with auto create if not exists.

my goal is to make a dynamical pipeline to get all the data files from the blob to the sql data base. and when a new file is brought to the storage the pipeline only has to run again to put it in the sql data base.

here are som examples of the files

01_data.csv Jaar,Special,Data 2023,cd,september 2022,ps,oktober

02_data.csv JAAR,mechanic,Data 2021,paus,march 2000,klank,januari

top100.csv number, id, place,Jaar 1,1234,99,2000

do you get the picure?

I get the error of failure happened on the sink side of the copy data activity.

i tryed a lot but nothing works zo can anyone help?

1

There are 1 best solutions below

0
On BEST ANSWER

SQL doesn't support table names starting with numbers like yours. So, you need to cross check those.

i want to get all of the files to auto create a table in the azure sql database, and if exists truncate the tables and fill them again when the file is renewed.

You can use Auto-create table and pre-copy script in copy activity sink to achieve your requirement.

But In this case, you need to use two kinds of pipelines. One for the existing files which will run only once on start manually.

The second pipeline is for the newly creating files, which will execute by the storage event trigger.

Follow the below approach for the first pipeline.

These are my starting source files.

enter image description here

I have sample2 table with existing data.

enter image description here

First use Get meta data activity to get the files list and give that files array to the ForEach activity.

Inside ForEach, extract the table name from the file name.

Take the copy activity and use dataset parameter for the file name of the source file. Give the SQL dataset as sink. Use dataset parameter for the table name like below.

enter image description here

In the copy activity sink, give your table name which is stored in the variable to the parameter. Click on the Auto create table and give the below code in the pre-copy script.

IF OBJECT_ID('@{variables('table_name')}') IS NOT NULL TRUNCATE TABLE @{variables('table_name')};

enter image description here

Now, debug the pipeline and you can see the data in sample2 was truncated and new data inserted and other two tables were created.

enter image description here

Now for the second pipeline, Use the storage event trigger. Get the newly created file name from the trigger parameter @triggerBody().fileName.

Use the same copy activity with pre-copy script as the above but don't use any loop as we are not dealing with multiple files this time.

Go through this SO answer to understand about the Storage event trigger.