turn data in json from wide to long format using trino

76 Views Asked by At

ok so i have an s3 folder with a lot of json files in it. the json files look like this:

{responses:[{
  "QID1":3,
  "QID2_1":4,
  "QID2_2:"hallo"
},...]}

i do not know how many distinct QIDxy values there are. I know that all the keys i am interested in start with "QID" followed by some combination of digits and underscores.

I use trino to query those folders. So for this example i would normally do it like this.

CREATE TABLE responses (
  responses ARRAY(
    ROW(
      QID1 int,
      QID2_1 int,
      QID2_2 varchar
    )
)
WITH (
  format = 'json',
  external_location = 's3://xyz'
);

But in this case i do not know which QIDs - column names - there are. So I would like to dynamically build a long format, that has all the keys of the json objects in it an should look like this.

question_id answer_int answer_str
QID1 3 null
QID2_1 4 null
QID2_2 null "hallo"
... ... ...

any idea how i can do this without knowing the concrete column names and without preprocessing the json files. Maybe there is something like trino integrated jslt that could be used.

0

There are 0 best solutions below