Spark shuffle service on local shared dir with Ceph on kubernetes

88 Views Asked by At

We run Spark 3.X on kubernetes, executor pods share the same readWriteMany Ceph volume.

So, all Spark workers write shuffle data on the same volume (I guess into different dirs), available for any worker.

On other side, Spark is sharing shuffle data over network.

How can I configure Spark to use local volume to get shuffle data from other worker rather than using TCP download?

0

There are 0 best solutions below