Insert Overwrite partition data using Spark SQL on MINIO table

19 Views Asked by At

I have one MINIO Table partitioned by RUN_DATE which has one year of data. I am just giving one sample below. I want to delete last 3 days of data and reload it back from another table. I am trying below query but that is not dropping the partitions. Any help will be highly appreciated.

  >>> spark.sql('''select * from dx_dl_abc_xyz.test''').show(10,False)
+------------------+------------------+-----------+----------+
|asset_name        |module_name       |application|run_date  |
+------------------+------------------+-----------+----------+
|Campaign Reporting|Campaign Reporting|Marketing  |2024-03-21|
|Orders            |Orders            |C&R        |2024-03-18|
|CX                |CX                |CX&Digital |2024-03-20|
|APM               |APM               |C&R        |2024-03-19|
+------------------+------------------+-----------+----------+

CREATE TABLE `dx_dl_abc_xyz`.`test` (
`asset_name` STRING,
`module_name` STRING,
`application` STRING,
`run_date` STRING)
USING parquet
PARTITIONED BY (run_date)
LOCATION 's3a://dx.dl.abc.xyz/abc_dashboard/test'

sqlContext.setConf("hive.exec.dynamic.partition", "true")
sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

spark.sql('''alter table dx_dl_abc_xyz.test drop if exists 
partition(run_date='2024-03-21')''').show()
spark.sql('''msck repair table dx_dl_abc_xyz.test''').show()

Even after running above commands still seeing all 4 records.

Thanks, Debasis

0

There are 0 best solutions below