How read data partitons in S3 from Trino

4.7k Views Asked by At

I'm trying to read data partitons in S3 from Trino. What I did exactly:

I uploaded my data with all partitions into S3. I have a specified avro schema, I put it in file local system.

Then I created an external hive table to point to the data location in S3 and to the avro schema in file local system. Table is created.

Then, normaly I can query my data and partitions in S3 from Trino.

Trino>select * from hive.default.my_table;

It return only columns names.

trino>select * from hive.default."my_table$partitions";

it return only name of partitions.

Could you please suggest me a solution how can I read data partitons in S3 from Trino ?

Knowing that I'm using Apache Hive 2, even when I query the table in hive to return the table partitions, it return Ok, and display any thing. I think because Hive 2 we should use MSCK command

2

There are 2 best solutions below

0
On BEST ANSWER

Faced the same issue. Once the table is created, we need to manually sync up the schema to the metastore using the below command of trino.

CALL system.sync_partition_metadata('<schema>', '<table>', 'ADD');

Ref.: https://trino.io/episodes/5.html

0
On

In Hive uploading partition folders and files into S3 and creating table is not enough, partition metadata should be created. Normally you can have folders not mounted as partitions. To mount all existing sub-folders in the table location as partitions:

Use msck repair table command:

MSCK [REPAIR] TABLE tablename;

or Amazon EMR version:

ALTER TABLE tablename RECOVER PARTITIONS;

It will create partition metadata in Hive metastore and partitions will become available.

Read more details about both commands here: RECOVER PARTITIONS