AWS Athena JSON format failure

4.3k Views Asked by At

I am trying to upload the data from this json:

JSON-A

[
  {"name": "james", "id": 41},
  {"name": "scott", "id": 62},
  {"name": "abhi", "id": 16},
  {"name": "kevin", "id": 53},
  {"name": "beau", "id": 12},
  {"name": "shally", "id": 35},
  {"name": "jude", "id": 53},
  {"name": "jason", "id": 77},
  {"name": "hongjian", "id": 35},
  {"name": "madhur", "id": 6}
]

If it had the parent "data" key like

JSON-B

["data":{"name": "james","id": 41"},{.....}]

then I know I can do like :

CREATE EXTERNAL TABLE IF NOT EXISTS test.test (
  `data` array<struct<`name`:string,`id`:bigint>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://test-bucket/';

But what should be the way to do for JSON-A?

2

There are 2 best solutions below

0
On

I do not think this will work, because of the Hive/SerDe assumption that your text file contain one record per line. From the org.openx.data.jsonserde.JsonSerDe documentation:

  1. One Record Per Line - although I'm not sure your sample is meant to be literal in the multiline respect, the sample does contain multiple records in the same line.

IMPORTANT!!! READ THIS BELOW!! Json records must be one per line, that is, the serde WILL NOT WORK with multiline Json.

  1. Array Examples One-per-Line - the serde examples do show how to work with arrays, but again, not a top-level array encompassing multiple records in the same line.
0
On

You will have to hold a file in 's3://test-bucket/':
{"name": "james", "id": 41} {"name": "scott", "id": 62} {"name": "abhi", "id": 16} {"name": "kevin", "id": 53} {"name": "beau", "id": 12} {"name": "shally", "id": 35} {"name": "jude", "id": 53} {"name": "jason", "id": 77} {"name": "hongjian", "id": 35} {"name": "madhur", "id": 6}