programmatically deleting parquet partitions from S3 bucket using pyspark

790 Views Asked by thentangler At 27 July 2025 at 14:45

I have a parquet file partitioned in the S3 file system (s3fs) like so:

STATE='DORMANT'
-----> DATE=2020-01-01
-----> DATE=2020-01-02
             ....
-----> DATE=2020-11-01

STATE='ACTIVE'
-----> DATE=2020-01-01
-----> DATE=2020-01-02
             ....
-----> DATE=2020-11-01

Every day new data is appended to the parquet file and partitioned accordingly.

I would like to keep only the last 90 days of data and delete the rest. So when the 91'st data of data comes in, it appends and then deletes day 1 in the DATE partition. When day 92 comes in, it deletes day 2 and so on.

Is this possible via pyspark?

Original Q&A

programmatically deleting parquet partitions from S3 bucket using pyspark

There are 0 best solutions below

Related Questions in AMAZON-S3

Related Questions in PYSPARK

Related Questions in PARQUET

Related Questions in PARTITION

Related Questions in PYTHON-S3FS

Trending Questions

Popular # Hahtags

Popular Questions