In the S3 storage, the partition is stored in the format "partition_mon_yr=202203." Upon ingestion by Glue, Glue interprets "partition_mon_yr" as the field name and "202203" as its corresponding value. This discrepancy arises because Snowflake external files construct the partition field from the metadata field value, resulting in a combined representation like "partition_mon_yr=202203."
Is it a good idea to extract the substring and store the "202203" value in the partition_mon_yr column in Snowflake? so that if the same data is used in Glue and Snowflake both will display in the same way. But my question arises, If I store whether partitioning against S3 will work or snowflake partitioning on an external table will not work with S3.
In Snowflake
Actual partition value
CREATE EXTERNAL TABLE et1(
partition_mon_yr string AS SPLIT_PART(metadata$filename, '/', 3)
col1 varchar AS (value:col1::varchar)
col2 varchar AS (value:col2::varchar))
PARTITION BY (partition_mon_yr)
LOCATION=@s1/logs/
AUTO_REFRESH = true
FILE_FORMAT = (TYPE = PARQUET);
partition_mon_yr contains the value partition_mon_yr=202203
Extract the YYYYMO value:
CREATE EXTERNAL TABLE et1(
partition_mon_yr string AS SPLIT_PART(SPLIT_PART(metadata$filename, '/', 3),'=',2),
col1 varchar AS (value:col1::varchar)
col2 varchar AS (value:col2::varchar))
PARTITION BY (partition_mon_yr)
LOCATION=@s1/logs/
AUTO_REFRESH = true
FILE_FORMAT = (TYPE = PARQUET);
partition_mon_yr contains the value 202203.