ICEBERG_FILESYSTEM_ERROR: Cannot create a table on a non-empty location

312 Views Asked by At

My requirement is to drop and recreate a ICEBERG table on every week (environment AWS Athena and storage as S3)

So I am running below 2 SQLs as script.

DROP TABLE my_table;

CREATE TABLE my_table WITH (
is_external = false,
location = 's3://my_bucket/my_table',
table_type='iceberg',
vacuum_max_snapshot_age_seconds=3600
write_compression='ZSTD',
format='PARQUET'
)
AS
SELECT * from my_backup_table;

Now the challenge is DROP SQL command never cleans up S3 location completely if the table becomes very huge and got accumulated many small files. So the CREATE TABLE script fails with

ICEBERG_FILESYSTEM_ERROR: Cannot create a table on a non-empty location: s3://my_bucket/my_table. You may need to manually clean the data at location 's3://my_bucket/Athena_Results/tables/13654f64-6fcb-4879-9e45-bc510be429ae' before retrying. Athena will not delete data in your account.

Now I recreate the table (after manual cleanup of S3 location) and find that object count in new table reduced drastically and then DROP/CTAS scripts works fine and DROP cleans up entire S3 location. But it won't after few days when object count will be increased.

1

There are 1 best solutions below

0
On

The issue is due to the fact that DROP with iceberg times out under the hood, and the underlying s3 location is left dirty if you accumulate too many s3 objects. You don't notice such behavior with a fresh recreated table because the s3 objects are not to many.

Few options:

  • don't use drop, but a Glue apis plus s3 delete objects. e.g. Pick the s3 location of the table, delete table via glue delete table api, then finally delete the s3 location
  • you could consider to use dbt, specifically dbt-athena https://github.com/dbt-athena/dbt-athena, where you have way more utilities and the 1st option described above is a native way to deal with iceberg tables