Speculative execution Mapreduce/Spark

9.8k Views Asked by Ram At 24 October 2025 at 13:33

I know Hadoop/Spark framework will detect failed or slow machines and execute the same tasks on different machine. How will (On what basis) framework identifies the slow running machines. Is there any kind of stats for the framework to decide?

Can someone shed light some light here?

Original Q&A

There are 2 best solutions below

Sandeep Singh On 23 July 2017 at 14:33 BEST ANSWER

The MapReduce model is to break jobs into tasks and run the tasks in parallel to make the overall job execution time smaller than it would be if the tasks ran sequentially.

yarn.app.mapreduce.am.job.task.estimator.class- When MapReduce model lunch a new job this property and implementation is being used to estimate the task completion time at runtime. The estimated completion time for a task should be less than a minute. If a task is running beyond this estimated time it can mark as slow running task.

yarn.app.mapreduce.am.job.speculator.class - This property is being used to implementing the speculative execution policy.

vaquar khan On 20 September 2017 at 17:45

Spark.speculation default value is false If you set to "true", performs speculative execution of tasks. This means if one or more tasks are running slowly in a stage, they will be re-launched.

http://spark.apache.org/docs/latest/configuration.html

You can add these flags to your spark-submit, passing them under --conf e.g.:

spark-submit \
--conf "spark.speculation=true" \
--conf "spark.speculation.multiplier=5" \
--conf "spark.speculation.quantile=0.90" \
--class "org.asyncified.myClass" "path/to/Vaquarkhanjar.jar"

Note : Spark driver is spending a lot of time in speculation when managing a large number of tasks. enable it only if needed.

Speculative execution Mapreduce/Spark

There are 2 best solutions below

Related Questions in APACHE-SPARK

Related Questions in MAPREDUCE

Related Questions in SPECULATIVE-EXECUTION

Trending Questions

Popular # Hahtags

Popular Questions