Spark shuffle disk spill increase when upgrading versions

65 Views Asked by At

When upgrading from spark 2.3 to spark 2.4.3, I saw a 20-30% increase in the amount of shuffle disk spill one of my stages generated.

The same code is being executed in both environments.

All configurations are identical between both environments

1

There are 1 best solutions below

0
On

Run .explain(false) on both 2.4.3 and 2.3.0. Additionally dump the configs used on both. There have been changes to the way optimization rules in those releases. Also where are you running spark? There is a dirty secret that many of the providers of spark have been customizing and improving spark under the hood. I suspect there is more going on than you suspect.