How do I import an array of data into separate rows in a hive table?

1.1k Views Asked by shrewquest At 27 November 2025 at 16:41

I am trying to import data in the following format into a hive table

[
    {
      "identifier" : "id#1",
      "dataA" : "dataA#1"
    },
    {
      "identifier" : "id#2",
      "dataA" : "dataA#2"
    }
]

I have multiple files like this and I want each {} to form one row in the table. This is what I have tried:

CREATE EXTERNAL TABLE final_table(
    identifier STRING,
    dataA STRING
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

This is not creating a single row for each {} though. I have also tried

CREATE EXTERNAL TABLE final_table(
    rows ARRAY< STRUCT<
    identifier: STRING,
    dataA: STRING
    >>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://bucket/path_in_bucket/"

but this is not work either. Is there some way of specifying that the input as an array with each record being an item in the array to the hive query? Any suggestions on what to do?

Original Q&A

There are 2 best solutions below

sandy kay On 28 November 2017 at 10:07

JSON records in data files must appear one per line, an empty line would produce a NULL record.

This json should work

{ "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" }

Ramesh On 28 November 2017 at 17:53

Here is what you need

Method 1: Adding name to the array

Data

{"data":[{"identifier" : "id#1","dataA" : "dataA#1"},{"identifier" : "id#2","dataA" : "dataA#2"}]}

SQL

SET hive.support.sql11.reserved.keywords=false;

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_test (
  data array<
    struct<
      identifier:STRING, 
      dataA:STRING
    >
  >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 'my_location';

SELECT rows.identifier,
       rows.dataA
  FROM ramesh_test d
LATERAL VIEW EXPLODE(d.data) d1 AS rows  ;

Output

Method 2 - No Changes to the data

Data

[{"identifier":"id#1","dataA":"dataA#1"},{"identifier":"id#2","dataA":"dataA#2"}]

SQL

CREATE EXTERNAL TABLE IF NOT EXISTS ramesh_raw_json (
  json STRING
)
LOCATION 'my_location';

SELECT get_json_object (exp.json_object, '$.identifier') AS Identifier,
       get_json_object (exp.json_object, '$.dataA') AS Identifier
  FROM ( SELECT json_object
           FROM ramesh_raw_json a
           LATERAL VIEW EXPLODE (split(regexp_replace(regexp_replace(a.json,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''), '\\;')) json_exploded AS json_object ) exp;

Output

How do I import an array of data into separate rows in a hive table?

There are 2 best solutions below

Related Questions in ARRAYS

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in CREATE-TABLE

Related Questions in HIVE-SERDE

Trending Questions

Popular # Hahtags

Popular Questions