Does Spark Dynamic Allocation depend on external shuffle service to work well?

70 Views Asked by At

I want to use Spark DRA (Dynamic Resource Allocation) feature, so that the executors can be requested/released dynamically based on my application workload to improve resource utilization But I wonder whether I must enable the spark external shuffle service to use the DRA (that is, whether DRA depends on spark external service to work).

In my opinion, DRA should depend on external shuffle service to work well. So that it can serve the released executor's shuffle data to other executors once the executor is released and gone.

Could someone help explain whether my understanding is correct.

2

There are 2 best solutions below

0
On BEST ANSWER

Broadly speaking, you are right -- there should be some persistence mechanism to make dynamic allocation work. But in the narrower context of your question, I would go with a firm'ish NO because modern Spark versions provide other means to persist and serve shuffle blocks beyond External Shuffle Service (ESS). This is stated, clear and concise, in Spark Config:

Property Name spark.dynamicAllocation.enabled
Default false
Meaning Whether to use dynamic resource allocation...
This requires one of the following conditions:
1) enabling external shuffle service through spark.shuffle.service.enabled, or
2) enabling shuffle tracking through spark.dynamicAllocation.shuffleTracking.enabled, or
3) enabling shuffle blocks decommission through spark.decommission.enabled and spark.storage.decommission.shuffleBlocks.enabled, or
4) (Experimental) configuring spark.shuffle.sort.io.plugin.class to use a custom ShuffleDataIO who's ShuffleDriverComponents supports reliable storage.
:

"No" is especially true for other Resource Managers than YARN, Kubernetes for example (which does not provide external shuffle service at all at the moment). The "-ish" in NO is because YARN still owns the majority, and requires that service for dynamic allocation.

9
On

Long story short --> YES and NO.

NO --> Strictly speaking you need not enable External Shuffle Service. This means in some cases re-compuation.

YES --> You may wish to avoid re-computation, hence the use of External Shuffle Service is advised, from some time ago already.

There are some caveats. You can read here https://spark.apache.org/docs/3.5.1/job-scheduling.html#graceful-decommission-of-executors

Graceful Decommission of Executors Before dynamic allocation, if a Spark executor exits when the associated application has also exited then all state associated with the executor is no longer needed and can be safely discarded. With dynamic allocation, however, the application is still running when an executor is explicitly removed. If the application attempts to access state stored in or written by the executor, it will have to perform a recompute the state. Thus, Spark needs a mechanism to decommission an executor gracefully by preserving its state before removing it.

This requirement is especially important for shuffles. During a shuffle, the Spark executor first writes its own map outputs locally to disk, and then acts as the server for those files when other executors attempt to fetch them. In the event of stragglers, which are tasks that run for much longer than their peers, dynamic allocation may remove an executor before the shuffle completes, in which case the shuffle files written by that executor must be recomputed unnecessarily.

The solution for preserving shuffle files is to use an external shuffle service, also introduced in Spark 1.2. This service refers to a long-running process that runs on each node of your cluster independently of your Spark applications and their executors. If the service is enabled, Spark executors will fetch shuffle files from the service instead of from each other. This means any shuffle state written by an executor may continue to be served beyond the executor’s lifetime.

In addition to writing shuffle files, executors also cache data either on disk or in memory. When an executor is removed, however, all cached data will no longer be accessible. To mitigate this, by default executors containing cached data are never removed. You can configure this behavior with spark.dynamicAllocation.cachedExecutorIdleTimeout. When set spark.shuffle.service.fetch.rdd.enabled to true, Spark can use ExternalShuffleService for fetching disk persisted RDD blocks. In case of dynamic allocation if this feature is enabled executors having only disk persisted blocks are considered idle after spark.dynamicAllocation.executorIdleTimeout and will be released accordingly. In future releases, the cached data may be preserved through an off-heap storage similar in spirit to how shuffle files are preserved through the external shuffle service.