Implement custom spark shuffle writer/reader

47 Views Asked by Brave At 08 May 2023 at 14:46

I'm feeling kind of lost when reading the Spark documentation... I'm trying to do something "simple", I want to replace the part of the code in the Shuffle process that is responsible for writing data to the disk and the part that responsible for reading the block of data from the disk.

From what I've read the shuffle manager is the instance responsible for shuffling and he owns two instances that specifically deals with write/reads, ShuffleWriter and ShuffleReader.

What I'm not completely understand is the API, what the ShuffleReader/Writer get and what they return. Who is sending requests for the shuffleWriter and how he replies? Who is sending requests for the shuffleReader and how he replies? In both cases I'm guessing that it's the ShuffleManager but I'm not sure exactly what he sends and what he expect to receive?

I'm guessing that a write(spill is more correct description?)/read operation (of RDD?) need to be identified by some ID or combination of ID's but I'm not sure exactly how it's working.

If you can help me understand the missing gaps and describe me the API it will be great.

Original Q&A

Implement custom spark shuffle writer/reader

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SHUFFLE

Related Questions in READER

Related Questions in WRITER

Trending Questions

Popular # Hahtags

Popular Questions