Can we create several entries from one line?

185 Views Asked by At

My logs look like this: client_id;event_1;event_2;event3

And i would like to get an SQL Table like this:

 client_id |   event
 ---------------------
    ...    |  event_1
    ...    |  event_2
    ...    |  event_3

I am new to Hive, it seems to me that one log line always provides one entry in the resulting SQL table. I tried the following (unsuccessful):

CREATE EXTERNAL TABLE IF NOT EXISTS tablename (
         client_id String,
         `event` String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         "input.regex" = "^([^\;]+);.*([^\;]+).*$" )
LOCATION 's3://myBucket/prefix/';

It takes only the first event and ignore the others...

1

There are 1 best solutions below

3
leftjoin On BEST ANSWER

Unfortunately, it is not possible to generate rows using SerDe in table DDL. It's possible to do the same in Hive.

(1) Read all user events as a single column:

CREATE EXTERNAL TABLE IF NOT EXISTS tablename (
         client_id String,
         events    String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
 "input.regex" = "^([^\\;]+)\\;(.*)$" )
LOCATION 's3://myBucket/prefix/';

Check, it should read two columns, user_id and all events concatenated:

'client_id' and 'event_1;event_2;event3'

(2) Split events and explode to generate rows:

 select t.user_id, e.event
   from tablename t
        lateral view outer explode(split(t.events,'\\;')) e as event;

Read also about Lateral View.

In Athena use UNNEST with CROSS JOIN:

select t.user_id, e.event
       from tablename t
       CROSS JOIN UNNEST(SPLIT(t.events,';')) AS e (event)