More number tasks than number vCPUs on dataproc

981 Views Asked by At

We are observing some strange behaviour concerning number of executors and tasks on dataproc. Our understanding is that (in theory) the number of cores available in the cluster will limit the number of tasks that can run in parallel: 32 cores means a maximum number of 32 tasks. However, in dataproc we often observe some different behavior, basically double number of theoretically possible concurrent tasks. Here is an example:

Running dataproc cluster with 12+1(master) n1-standard-4 machines. This gives 48 available vcores with 15GB RAM per machine. We launch our spark app with

spark.executor.cores = 4

... which should give us 12 executors, each being able to run 4 tasks in parallel, i.e. 48 parallel tasks, while underutilising memory, as dataproc will automatically assign spark.executor.memory = 5586m. However, what actually happens is that we seem to end up with 24 executors, running a total of 92 tasks in parallel, and hence being (almost) twice as fast. And we don't understand why.

Spark monitor

YARN monitor also tells us that there are 24 containers, although there should be 12 (with 4 cores each).

enter image description here

1

There are 1 best solutions below

2
On

Please check number of CPUs per node and number of threads per CPU and verify if you have as many cores as you say.