Stack:
Trino version: 395
Storage: Alluxio with AWS S3
Metadata store: AWS glue
I have a daily spark job to save parquet file with 3 partition key(year, month, day) in S3, then all the data will be synchronized to Alluxio. However, although I check that all data exist in both S3 and Alluxio, I can't query the latest data until I manually call the system.sync_partition_metadata()
every time. This is how I create the table:
create table glue.table_tc.table_name (
col1 varchar,
col2 varchar,
col3 varchar,
col4 varchar,
col5 bigint,
year int,
month int,
day int
) with (
format='parquet',
partitioned_by=array['year', 'month', 'day'],
external_location='alluxio://path/to/table');
I initially think that this is caused by the cache of metadata and out of synchronization. Therefore, I tried to turn hive.metastore-cache.cache-partitions
to false
to avoid caching. Also, I tried to shorten the hive.metastore-refresh-interval
to 5s
, but both are not worked.
May I know how to synchronize the metadata / partition value automatically? Did I miss something? Thank you so much for your help!