Trino Hive connector can't synchronize the partition metadata automatically

1.1k Views Asked by At

Stack:

Trino version: 395

Storage: Alluxio with AWS S3

Metadata store: AWS glue

I have a daily spark job to save parquet file with 3 partition key(year, month, day) in S3, then all the data will be synchronized to Alluxio. However, although I check that all data exist in both S3 and Alluxio, I can't query the latest data until I manually call the system.sync_partition_metadata() every time. This is how I create the table:

create table glue.table_tc.table_name (
    col1 varchar, 
    col2 varchar, 
    col3 varchar, 
    col4 varchar, 
    col5 bigint, 
    year int, 
    month int, 
    day int
) with (
    format='parquet', 
    partitioned_by=array['year', 'month', 'day'], 
    external_location='alluxio://path/to/table');

I initially think that this is caused by the cache of metadata and out of synchronization. Therefore, I tried to turn hive.metastore-cache.cache-partitions to false to avoid caching. Also, I tried to shorten the hive.metastore-refresh-interval to 5s, but both are not worked.

May I know how to synchronize the metadata / partition value automatically? Did I miss something? Thank you so much for your help!

0

There are 0 best solutions below