What are some way I can debug through a sparksql query?
I have defined a dataframe with a sparpksql query and I have included show(1), but the query continues to run very long. Could anyone provide pointers? Thank you!
select t1.id
from table1 t1
join table2 t2 on t1.id = t2.cd
where t1.product = 'I'
and not exists
(select * from table3 t3
where t3.id = t1.id
and t3.year = t1.year
and a3.month = t1.month
and t3.day = t1.day
and t3.code in ('4321','5604'))
and not exists
(select MAX(status) from table4 t4
where t4.id = t1.id
and t4.year = t1.year
and t4.month = t1.month
and t4.day = t1.day
having max(status) > 3)
""").show(1)
list_of_id_df.createOrReplaceTempView("list_of_id")
list_of_id_df.show(1)
print("Done")