apache spark executors and data locality

Question

apache spark executors and data locality

628 Views Asked by user045213 At 17 October 2025 at 06:33

The spark literature says

Each application gets its own executor processes, which stay up for the duration of the whole application and run tasks in multiple threads.

And If I understand this right, In static allocation the executors are acquired by the Spark application when the Spark Context is created on all nodes in the cluster (in a cluster mode). I have a couple of questions

If executors are acquired on all nodes and will stay allocated to this application during the the duration of the whole application, isn't there a chance a lot of nodes remain idle?
What is the advantage of acquiring resources when Spark context is created and not in the DAGScheduler? I mean the application could be arbitrarily long and it is just holding the resources.
So when the DAGScheduler tries to get the preferred locations and the executors in those nodes are running the tasks, would it relinquish the executors on other nodes?

I have checked a related question Does Spark on yarn deal with Data locality while launching executors

But I'm not sure there is a conclusive answer

Original Q&A

There are 1 best solutions below

**Avishek Bhattacharya** · Answer 1

If executors are acquired on all nodes and will stay allocated to this application during the the duration of the whole application, isn't there a chance a lot of nodes remain idle?

Yes there is chance. If you have data skew this will happen. The challenge is to tune the executors and executor core so that you get maximum utilization. Spark also provides dynamic resource allocation which ensures the idle executors are removed.

What is the advantage of acquiring resources when Spark context is created and not in the DAGScheduler? I mean the application could be arbitrarily long and it is just holding the resources.

Spark tries to keep data in memory while doing transformation. Contrary to map-reduce model where after every Map operation it writes to disk. Spark can keep the data in memory only if it can ensure the code is executed in the same machine. This is the reason of allocating resource beforehand.

So when the DAGScheduler tries to get the preferred locations and the executors in those nodes are running the tasks, would it relinquish the executors on other nodes?

Spark can't start a task on an executor unless the executor is free. Now spark application master negotiates with the yarn to get the preferred location. It may or may not get that. If it doesn't get, it will start task in different executor.

apache spark executors and data locality

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in EXECUTORS

Trending Questions

Popular # Hahtags

Popular Questions