Can we create several entries from one line?

167 Views Asked by At

My logs look like this: client_id;event_1;event_2;event3

And i would like to get an SQL Table like this:

 client_id |   event
 ---------------------
    ...    |  event_1
    ...    |  event_2
    ...    |  event_3

I am new to Hive, it seems to me that one log line always provides one entry in the resulting SQL table. I tried the following (unsuccessful):

CREATE EXTERNAL TABLE IF NOT EXISTS tablename (
         client_id String,
         `event` String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
         "input.regex" = "^([^\;]+);.*([^\;]+).*$" )
LOCATION 's3://myBucket/prefix/';

It takes only the first event and ignore the others...

1

There are 1 best solutions below

3
On BEST ANSWER

Unfortunately, it is not possible to generate rows using SerDe in table DDL. It's possible to do the same in Hive.

(1) Read all user events as a single column:

CREATE EXTERNAL TABLE IF NOT EXISTS tablename (
         client_id String,
         events    String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
 "input.regex" = "^([^\\;]+)\\;(.*)$" )
LOCATION 's3://myBucket/prefix/';

Check, it should read two columns, user_id and all events concatenated:

'client_id' and 'event_1;event_2;event3'

(2) Split events and explode to generate rows:

 select t.user_id, e.event
   from tablename t
        lateral view outer explode(split(t.events,'\\;')) e as event;

Read also about Lateral View.

In Athena use UNNEST with CROSS JOIN:

select t.user_id, e.event
       from tablename t
       CROSS JOIN UNNEST(SPLIT(t.events,';')) AS e (event)