ok so i have an s3 folder with a lot of json files in it. the json files look like this:
{responses:[{
"QID1":3,
"QID2_1":4,
"QID2_2:"hallo"
},...]}
i do not know how many distinct QIDxy values there are. I know that all the keys i am interested in start with "QID" followed by some combination of digits and underscores.
I use trino to query those folders. So for this example i would normally do it like this.
CREATE TABLE responses (
responses ARRAY(
ROW(
QID1 int,
QID2_1 int,
QID2_2 varchar
)
)
WITH (
format = 'json',
external_location = 's3://xyz'
);
But in this case i do not know which QIDs - column names - there are. So I would like to dynamically build a long format, that has all the keys of the json objects in it an should look like this.
question_id | answer_int | answer_str |
---|---|---|
QID1 | 3 | null |
QID2_1 | 4 | null |
QID2_2 | null | "hallo" |
... | ... | ... |
any idea how i can do this without knowing the concrete column names and without preprocessing the json files. Maybe there is something like trino integrated jslt that could be used.