expire S3 objects after deletion from Delta Lake without breaking meta data

172 Views Asked by At

We collect raw data from various data delivery streams in S3, in Delta format. We choose Delta mainly because we want an easy way to compact the many small objects into bigger S3 objects, that can later be processed more (cost-)efficiently. We want to keep this data only for 60 days, and then expire it. DELETE-ing the data from the tables will not physically remove it, but only update the transaction logs to point to the latest data sources. Only a VACUUM will physically change objects in S3. However, touching the S3 objects costs additional money, which we want to avoid. Our idea was to use S3 life cycle policies to expire the data after 60 days, which costs almost nothing. But it seems that this is not recommended, because it can corrupt the Delta transaction logs, and render the table unusable. Is there a method allowing to safely remove objects from S3 after a given time period, without corrupting the Delta meta data?

0

There are 0 best solutions below