I'm using the copy data activity in Azure Data Factory to copy data from an API to our data lake for alerting & reporting purposes. The API response is comprised of multiple complex nested JSON arrays with key-value pairs. The API is updated on a quarter-hourly basis and data is only held for 2 days before falling off the stack. The API adopts an oldest-to-newest record structure and so the newest addition to the array would be the final item in the array as opposed to the first.

My requirement is to copy only the most recent record from the API as opposed to the collection - so the 192th reading or item 191 of the array (with the array starting at 0.) Due to the nature of the solution, there are times when the API isn't being updated as the sensors that collect and send over the data to the server may not be reachable.

The current solution is triggered every 15 minutes and tries a copy data activity of item 191, then 190, then 189 and so on. After 6 attempts it fails and so the record is missed.

current pipeline structure

I have used the mapping tab to specify the items in the array as follows (copy attempt 1 example):

$['meta']['params']['sensors'][*]['name']
$['meta']['sensorReadings'][*]['readings'][191]['dateTime']
$['meta']['sensorReadings'][*]['readings'][191]['value']

Instead of explicitly referencing the array number, I was wondering if it is possible to reference the last item of the array in the above code? I understand we can use 0 for the first record however I don't understand how to reference the final item. I've tried the following using the 'last' function but am unsure of how to place it:

$['meta']['sensorReadings'][*]['readings'][last]['dateTime']
$['meta']['sensorReadings'][*]['readings']['last']['dateTime']
last['meta']['sensorReadings'][*]['readings']['dateTime']
$['meta']['sensorReadings'][*]['readings']last['dateTime']

Example of my API response

Any help or advice on a better way to proceed would be greatly appreciated.

1

There are 1 best solutions below

1
wBob On

Can you call your API with a Web activity? If so, this pulls the API result into the data pipeline and then apply ADF functions like last to it.

A simple example calling the UK Gov Bank Holidays API:

enter image description here

This returns a resultset that looks like this:

{
    "england-and-wales": {
        "division": "england-and-wales",
        "events": [
            {
                "title": "New Year’s Day",
                "date": "2017-01-02",
                "notes": "Substitute day",
                "bunting": true
            },
            {
                "title": "Good Friday",
                "date": "2017-04-14",
                "notes": "",
                "bunting": false
            },
            {
                "title": "Easter Monday",
                "date": "2017-04-17",
                "notes": "",
                "bunting": true
            },
            ... etc

You can now apply the last function to is, e.g. using a Set Variable activity:

@string(last(activity('Web1').output['england-and-wales'].events))

Which yields the last bank holiday of 2023:

{
    "name": "varWorking",
    "value": "{\"title\":\"Boxing Day\",\"date\":\"2023-12-26\",\"notes\":\"\",\"bunting\":true}"
}

Or

@string(last(activity('Web1').output['england-and-wales'].events).date)