Does Spark shuffle write all intermediate data to disk, or only that which will not fit in memory ("spill")?
In particular, if the intermediate data is small, will anything be written to disk, or will the shuffle be performed entirely using memory without writing anything to disk?
I've checked the docs and related StackOverflow questions, but they weren't clear on this precise question.
Answer to question in single line yes but Memory management
spark 3.0is better . unified memory managementMAP PHASE
Reduce Phase:
The reduce tasks fetch the spilled data partitions from the map tasks' local disks, bringing them into memory for processing.
The reduce tasks operate on the merged data, performing the necessary computations