I need to retain say last 7 partitions and data of a given hive external table.
This can be either done via a shell script or a hive hql script. The table is partitioned by intgestion_date=YYYY-MM-DD
what would be the best way to find the cutoff date (of 7th partition) which I can then use in the drop partitions where clause to drop everything older than that. since it's an external table, I will have to change the table properties to make it internal before the drop and then revert it.
There are different possible approaches: drop all partitions older than 7 days, this is easy (shell):
But it seems this is not exactly what you want. Need to get 7th partition first and use it in the previous statement. Execute show partition, use sort, head and tail to get 7th partition: