Is there a limit to new tasks for Spark speculation?

655 Views Asked by At

Let's say I run a job in Spark with speculation = true.

If a task (let's say T1) takes a long time, Spark would launch a copy of task T1, say, T2 on another executor, without killing off T1.

Now, if T2 also takes more time than the median of all successfully completed tasks, would Spark launch another task T3 on another executor?

If yes, is there any limit to this spawning of new tasks? If no, does Spark limit itself to one parallel job, and waits indefinitely for completion of either one?

1

There are 1 best solutions below

3
On BEST ANSWER

The spark TaskSetManager is responsible for that logic. It is checking that at most one copy of the original task is running when trying to launch a speculatable task. So in your example it should never launch T3 since there would be 2 copies running.

You can find the relevant part of the code here.