Debug SparkSQL Query

40 Views Asked by At

What are some way I can debug through a sparksql query?

I have defined a dataframe with a sparpksql query and I have included show(1), but the query continues to run very long. Could anyone provide pointers? Thank you!

select t1.id
from table1 t1
join table2 t2 on t1.id = t2.cd
where t1.product = 'I'
and not exists
(select * from table3 t3
        where t3.id = t1.id 
        and t3.year = t1.year
        and a3.month = t1.month
        and t3.day = t1.day
        and t3.code in ('4321','5604'))
and not exists
(select MAX(status) from table4 t4
        where t4.id = t1.id
        and t4.year = t1.year
        and t4.month = t1.month
        and t4.day = t1.day
        having max(status) > 3)
""").show(1)

list_of_id_df.createOrReplaceTempView("list_of_id")
list_of_id_df.show(1)

print("Done")

0

There are 0 best solutions below