My requirement is to drop and recreate a ICEBERG table on every week (environment AWS Athena and storage as S3)
So I am running below 2 SQLs as script.
DROP TABLE my_table;
CREATE TABLE my_table WITH (
is_external = false,
location = 's3://my_bucket/my_table',
table_type='iceberg',
vacuum_max_snapshot_age_seconds=3600
write_compression='ZSTD',
format='PARQUET'
)
AS
SELECT * from my_backup_table;
Now the challenge is DROP SQL command never cleans up S3 location completely if the table becomes very huge and got accumulated many small files. So the CREATE TABLE script fails with
ICEBERG_FILESYSTEM_ERROR: Cannot create a table on a non-empty location: s3://my_bucket/my_table. You may need to manually clean the data at location 's3://my_bucket/Athena_Results/tables/13654f64-6fcb-4879-9e45-bc510be429ae' before retrying. Athena will not delete data in your account.
Now I recreate the table (after manual cleanup of S3 location) and find that object count in new table reduced drastically and then DROP/CTAS scripts works fine and DROP cleans up entire S3 location. But it won't after few days when object count will be increased.
The issue is due to the fact that
DROP
with iceberg times out under the hood, and the underlying s3 location is left dirty if you accumulate too many s3 objects. You don't notice such behavior with a fresh recreated table because the s3 objects are not to many.Few options: