I'm feeling kind of lost when reading the Spark documentation... I'm trying to do something "simple", I want to replace the part of the code in the Shuffle process that is responsible for writing data to the disk and the part that responsible for reading the block of data from the disk.
From what I've read the shuffle manager is the instance responsible for shuffling and he owns two instances that specifically deals with write/reads, ShuffleWriter and ShuffleReader.
What I'm not completely understand is the API, what the ShuffleReader/Writer get and what they return. Who is sending requests for the shuffleWriter and how he replies? Who is sending requests for the shuffleReader and how he replies? In both cases I'm guessing that it's the ShuffleManager but I'm not sure exactly what he sends and what he expect to receive?
I'm guessing that a write(spill is more correct description?)/read operation (of RDD?) need to be identified by some ID or combination of ID's but I'm not sure exactly how it's working.
If you can help me understand the missing gaps and describe me the API it will be great.