Why AQE is not shown?

173 Views Asked by At

My code is like

sql = '''
SELECT ...
FROM a 
LEFT JOIN b ON ...
LEFT JOIN c ON ...
LEFT JOIN d ON ...
'''
df = spark.sql(sql)
(df
 .repartition('col')
 .write
 .format('parquet')
 .mode('overwrite')
 .partitionBy('col')
 .option(...)
 .saveAsTable('...')
)

The final plan shows 2 broadcast joins and 1 SortMergeJoin. The SortMergeJoin is a LEFT JOIN between 100+ and 200+ million row tables. And it has skew. My question is I enabled AQE, and play with some configs (e.g. use spark.sql.shuffle.partitions=40000, spark.default.parallelism=400), but I didn't see AQE coalesce, and not see the AdaptiveSparkPlan node. I saw many examples of AQE are using GROUP BY. Does AQE only works with GROUP BY? Any reason why AdaptiveSparkPlan node is not shown for my query?

Thanks

0

There are 0 best solutions below