So I want to add these to a table in athena from a tsv file which I can do except for the last column genres. I mean I can add it but I want it to be like for example ["Comedy", "Mystery"] but it comes out as [Comedy,Mystery] which makes it impossible to access them in any way
tconst genres
tt0081313 Action
tt0081315 Comedy,Mystery
tt0081349 Comedy,Crime
This is what I did:
CREATE EXTERNAL TABLE `title_basics`(
`tconst` string,
`genres` Array<string>)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim' = '\t' # This is for separating them by tab which is right but how can I also
# add the genres the way I want them to the table
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'mylocation'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='-----')
The column genre is being interpreted as a string. There are two possible solutions:
Or directly create the column as an array by adding a
,
as a collection separator.