How to select the last element of a JSON array in AWS Athena?

169 Views Asked by At

My database has a column that contains a JSON array like this:

WITH dataset AS (
    SELECT *
    FROM (VALUES 
        ('1', JSON'[{ "name" : "foo" }, { "name" : "bar" }]'),
        ('2', JSON'[{ "name" : "fizz" }, { "name" : "buzz" }]'),
        ('3', JSON'[{ "name" : "hello" }, { "name" : "world" }]')
    ) AS t(id, my_array)
)

I want to select the name property from the last element in the array:

result
bar
buzz
world

I can select the first element easily enough:

SELECT json_extract_scalar(my_array, '$[0].name') FROM dataset

These attempts did not work for the last element:

SELECT json_extract_scalar(my_array, '$[-1].name') FROM dataset

SELECT json_extract_scalar(my_array, '$[cardinality(json_parse(my_array)) - 1].name') FROM dataset

SELECT element_at(my_array, -1) FROM dataset

Note: I cannot make any assumptions about the length of the JSON array.

3

There are 3 best solutions below

2
On BEST ANSWER

In addition to your answer.

  1. json_extract/json_extract_scalar work with quite limited subset of JSON path queries so one way to achieve your goal is to cast to array. Few notes:

    • No need for json_extract you can cast JSON type directly, if column is of string type use json_parse before cast.
    • You can use JSON as target type, i.e. array(json) or array(map(varchar, json)), which can be useful in case of mixed content inside array/JSON object properties:
    SELECT element_at(CAST(my_array AS ARRAY(MAP(VARCHAR, JSON))), -1)['name']
    FROM dataset;
    
  2. Trino, which should be the base for Athena since 3rd version has several advanced functions to use with JSON path like json_query or json_value, which can be used in this case:

    SELECT json_value(json_format(my_array), 'lax $[last].name')
    FROM dataset;
    
1
On

You can do so like this:

SELECT element_at(my_column, -1) 
FROM your_table_name;

More info:

https://docs.aws.amazon.com/athena/latest/ug/accessing-array-elements.html

0
On

Got it working:

  1. Cast the JSON array to an array
  2. Use element_at on the array
WITH dataset AS (
    SELECT *
    FROM (VALUES 
        ('1', JSON'[{ "name" : "foo" }, { "name" : "bar" }]'),
        ('2', JSON'[{ "name" : "fizz" }, { "name" : "buzz" }]'),
        ('3', JSON'[{ "name" : "hello" }, { "name" : "world" }]')
    ) AS t(id, my_array)
)

SELECT element_at(CAST(json_extract(my_array, '$') AS ARRAY(MAP(VARCHAR, VARCHAR))), -1)['name']
FROM dataset

Reference: https://docs.aws.amazon.com/athena/latest/ug/searching-for-values.html