We collect raw data from various data delivery streams in S3, in Delta format. We choose Delta mainly because we want an easy way to compact the many small objects into bigger S3 objects, that can later be processed more (cost-)efficiently. We want to keep this data only for 60 days, and then expire it. DELETE-ing the data from the tables will not physically remove it, but only update the transaction logs to point to the latest data sources. Only a VACUUM will physically change objects in S3. However, touching the S3 objects costs additional money, which we want to avoid. Our idea was to use S3 life cycle policies to expire the data after 60 days, which costs almost nothing. But it seems that this is not recommended, because it can corrupt the Delta transaction logs, and render the table unusable. Is there a method allowing to safely remove objects from S3 after a given time period, without corrupting the Delta meta data?
expire S3 objects after deletion from Delta Lake without breaking meta data
183 Views Asked by Alex At
0
There are 0 best solutions below
Related Questions in AMAZON-WEB-SERVICES
- "Access Denied" - User's Permissions to S3 Bucket
- Cohort analysis with Amazon Redshift / PostgreSQL
- Using Amazon KMS service on Heroku
- can't ssh in after cloning an EC2 instance on Amazon AWS
- Using HDFS with Apache Spark on Amazon EC2
- How can I access Mule ESB Community edition via browser?
- AWS EC2: Migrating from Windows to Linux Server
- AWS ELB Load Balancer: is it possible to set multiple session cookies?
- AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances
- Unable to access files from public s3 bucket with boto
- Cloudfront stream only part of the video
- s3cmd not working as cron-task when echos/dates are added
- How to deploy django 1.8 on Elastic Beanstalk using Docker
- InstanceProfile is required for creating cluster - create python function to install module
- How to fix WordPress HTTPS issues when behind an Amazon Load Balancer?
Related Questions in AMAZON-S3
- Convert JSON.gz to JSON in node js
- Downloading objects from S3 with presigned URL
- "Access Denied" - User's Permissions to S3 Bucket
- jQuery file upload to S3 (and rails) with CORS headers
- copying file from local machine to Ubuntu 12.04 returning permission denied
- AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances
- Unable to access files from public s3 bucket with boto
- s3cmd not working as cron-task when echos/dates are added
- AWS S3 object listing
- React-native upload image to amazons s3
- S3 restrictions on quantity of object downloads
- How to upload a photo in Meteor to S3 and have it sync to database item?
- Limit upload size to S3 with presigned URL
- dragonfly-s3 with S3 IAM user causing a forbidden 403 response from Amazon
- Split S3 files into multiple output files
Related Questions in DELTA-LAKE
- How to use delta lake with Spark 2.4.4
- check if delta table exists on a path or not in databricks
- Why Databricks Delta is copying unmodified rows even when merge doesn't update anything?
- DeltaLake: How to Time Travel infinitely across Datasets?
- Add new column to the existing table in Delta lake(Gen2 blob storage)
- Error when trying to move data from on-prem SQL database to Azure Delta lake
- Deduplicate Delta Lake Table
- Streaming data into delta lake, reading filtered results
- Optimize blob storage Deltalake using local scope table on Azure Databricks
- How to add Delta Lake support to Zeppelin's spark interpreter?
- Why does Delta Lake seem to store so much redundant information?
- Reference 'unit' is ambiguous, could be: unit, unit
- Snowflake interprets boolean values in parquet as NULL?
- Deleting from a DeltaTable using a dataframe of keys
- pyspark delta table: How to save a grouped Dataframe to Different Tables
Related Questions in S3-LIFECYCLE-POLICY
- Does s3 lifecycle rules overwrite Deny Delete Bucket or DeleteObject policy is s3 bucket?
- Unable to delete S3 bucket after implementing lifecyle using AWS CDK
- Lifecycle Policy Not Deleting Deleted Object Permanently On AWS S3 bucket
- How do I only transition objects greater than 100MB to AWS Glacier from S3 Standard using AWS Lifecycle Management Policies?
- Deleting S3 objects with arbitrary tag-value
- How to permanently delete an empty folder from a S3 bucket where versioning is enabled?
- expire S3 objects after deletion from Delta Lake without breaking meta data
- s3 bucket LifecycleConfiguration rule skipped some expired files
- S3 lifecycle transition: Does LastModified date change after a transition?
- AWS s3 lifecycle policy
- Can you set auto deletion for every bucket and every future bucket on MinIO
- Permanently delete all delete marked objects in versioned S3 bucket
- Terraform timeout error when trying to create multiple lifecycle rules on an s3 bucket
- Can't find S3 LifecycleTagPredicate in .net sdk for tag based configuration
- I have a S3 Bucket and want to create a lifecycle configuration to delete objects
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?