ADF Yearly slices, cannot get data from current year

386 Views Asked by At

I'm having trouble configuring a pipeline in Azure Data Factory.

I need to upload everyday the data from the last 2 years, I mean all of 2016 and 2017 until today. I set up a pipeline, with the WindowStart date as 2016/01/01 and WindowEnd date as 2018/01/01, with the Output configured with Availability->Frequency as Months:12. This generates 2 slices, one for 2016 and one for 2017.

The slice for 2016 runs successfully, but the slice for 2017 is always on "Pending execution", my guess is that as 2017 is not complete yet, it wont run.

Is there a way to force it to run even if the WindowEnd is in the future?

I've tried creating some chain activities by month and then grouping them, but none of my tests has given me what I need.

I feel like the problem is not that rare, because even if you want to do it monthly, the current month wont be uploaded. The problem comes from the fact that the WindowStart must match the SliceStart, and the WindowEnd must match the SliceEnd.

Am I missing something? Any suggestions?

2

There are 2 best solutions below

0
On BEST ANSWER

When you deploy the pipeline all slices are created. But by default they are scheduled only when you reach slice end. So your 2017 slice would run only when it is over.

You can specify "style": "StartOfInterval" to change the behavior.

By design slices are made to process a discrete period of non-overlapping time. Typically if you process your dataset daily, each day you will process data for the previous and therefore each slices cover a specific day that can all be cumulated.

If you want it to execute daily you would need to set the availability of your source to Day : 1

Depending on your source you can use the WndowStart parameter to filter the source but using only the year part for example.

For example if the source of the data is an SQL database and you use a stored proc to select the data to extract you could use this to pass only the year to your stored proc. Knowing this would be the year of the current day. The you can use that in your proc to filter records corresponding to current and last year.

"typeProperties": {
          "storedProcedureName": "dbo.your_stored_proc",
          "storedProcedureParameters": {
            "year": "$$Text.Format('{0:yyyy}', SliceStart)",
          }

Does this make sense?

I suggest reading this post to learn more about scheduling in ADF:

https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/

0
On

You need to set your pipeline and datasets availability styles to "StartOfInterval", like this:

        "availability": {
        "frequency": "Month",
        "interval": 12,
        "style": "StartOfInterval"
        }