DataFactory copies files multiple times when using wildcard

373 Views Asked by At

Hi all complete ADF newbie here - I have a strange issue with DataFactory and surprisingly cant see that anyone else has experienced this same issue.

To summarize:

  1. I have setup a basic copy activity from blob to an Azure SQL database with no transformation steps
  2. I have setup a trigger based on wildcard name. I.e. any files loaded to blob that start with IDT* will be copied to the database
  3. I have loaded a few files to a specific location in Azure Blob
  4. The trigger is activated
  5. As soon as it looks like it all works, a quick assessment of the record count shows that the same files have been imported X number of times

I have analysed what is happening, basically when I load my files to blob, they don't technically arrive at the exact same time. So when file 1 hits the blob, the wildcard search is triggered and it finds 1 file. Then when the 2nd file hits the blob some milliseconds later, the wildcard search is triggered again and this time it processes 2 files (first and second).

The the problem keeps compounding based on the number of files loaded.

I have tried multiple things to get this fixed to no avail, because fundamentally it is behaving "correctly".

I have tried:

  1. Deleting the file once it has processed but again due to the millisecond issue the file is technically still there and can still be processed
  2. I have added a loop to process 1 file at a time then deleting the file before the next is loaded based on file name in the blob but hasn't worked (and cant explain why)
  3. I have limited ADF to only 1 concurrent connection, this reduces the number of times it has duplicated but unfortunately still duplicates it
  4. Tried putting a wait timer at the start of the copy activity, but this causes a resource locking issue. I get an error saying that multiple waits are causing the process to fail
  5. Tried a combination of 1,2 and 3 and i end up with an entirely different issue in that it is trying to find file X, but now no longer exists because it was deleted as part of step 2 above

I am really struggling with something that seems extremely basic. So i am sure it is me overlooking something extremely fundamental as noone else seems to have this issue with ADF.

0

There are 0 best solutions below