Custom Pagination of Rest API in Azure Data Factory

1.4k Views Asked by At

I would like to retrieve all results from Rest API endpoint.The URL has the below form https://myapi.com/relativeapi?project=&repo=&prId=&page=&pageSize=&startTime=&endTime

By default when requesting data it is returned only the first page. A sample output is the below

    "pageSize":50,
    "num":50,
    "isLastPage":false,
    "data":
        {"ABC":{"mock1":[{"Id":18,"Date":"202104T02:04:53.000Z","attr1":0,"attr2":0,"attr3":0,"historyData":[{"Date":"2021-11-03T00:08:13.000Z","attr1":0,"attr2":0,"attr3":0,"attr4":{}} 

How can we achieve this in Azure Data Factory and retrieve all results from all pages (last page is till "IsLastPage=TRUE and "data" is empty)?

Also how can we incrementally request API data, so the pipeline does not need to run all results from beginning (page 1), but get results from last updated page

1

There are 1 best solutions below

0
On BEST ANSWER

@christi08

Since the next page information is not returned in the output. Unfortunately you will not be able to make utilize of the inbuilt pagination feature.

As alternative/workaround - you could use the below approach.

You could use an iterative approach to achieve your end goal.

STEP 1 :

Your request is going to be in the below format

 https://myapi.com/relativeapi?page=1.......
 https://myapi.com/relativeapi?page=2.......
 https://myapi.com/relativeapi?page=3.......
 https://myapi.com/relativeapi?page=n.......

Step 2 :

Create a variable named pageno at the pipeline level.

enter image description here

Step 3:

In the Rest Connector create a Parameter page.

enter image description here

This page parameter would be added as a relative url along with other parameter & path.

enter image description here

In your case, the base url will be different.

Step 4 :

Noe in the copy activity, under the source setting.

You will pass the parameter with the value of the pipeline variable.

This pipeline variable will be incremented.

enter image description here

So for each iteration - the pageno will be incremented and therefore the relative url is also dynamic.

You would need SET VARIABLE activity to increase the pageno pipeline variable.

To loop you could use an Until Activity

End condition

For the until activity to end, you will have to provide an expression.

  1. You could add another web activitity / lookup activity with the dynamic relative url.

You could access the output of the webactivity / lookup activity and access the isLastPage node - Until this is true.

  1. You could access the copyactivity output and see whether the number of rows written is 0. and end the until activity.