I am trying to import data in the following format into a hive table
[
{
"identifier" : "id#1",
"dataA" : "dataA#1"
},
{
"identifier" : "id#2",
"dataA" : "dataA#2"
}
]
I have multiple files like this and I want each {} to form one row in the table. This is what I have tried:
CREATE EXTERNAL TABLE final_table(
identifier STRING,
dataA STRING
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"
This is not creating a single row for each {} though. I have also tried
CREATE EXTERNAL TABLE final_table(
rows ARRAY< STRUCT<
identifier: STRING,
dataA: STRING
>>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"
but this is not work either. Is there some way of specifying that the input as an array with each record being an item in the array to the hive query? Any suggestions on what to do?
JSON records in data files must appear one per line, an empty line would produce a NULL record.
This json should work
{ "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" }