I have some historic tables that need deletion on some rows because we no longer can have that data. I also need to delete this data from previous delta table versions because of audit purposes.
For what I've read the VACUUM command would be good for my use case, with a small retention period of 5 hours. I'm testing this and the history won't go away with VACUUM, neither the VACUUM operation gets logged on the table history.
Steps:
- Delete the rows from the table.
DELETE FROM delta.`path_to_table` WHERE code = 20
- Vacuum the delta table.
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")
QUERY = f"""VACUUM delta.`path_to_table` RETAIN 5 HOURS"""
result = spark.sql(QUERY)
- Check history of delta table.
History of the table is the same without any VACUUM operation.
Any idea about this behaviour? Any suggestions of what I should try?
Thank you.