Hadoop speculative execution testing

320 Views Asked by At

I am working on Hadoop for my master thesis, Hadoop 1.1.2.

I am studying a new algorithm for speculative task and so in this first step i m trying to apply some changes in the code.

Sadly, also using 2 node, i cannot cause the speculative execution. I wrote some lines of code as Log in the class DefaultTaskSelector (this is the class for speculative task), but this class, after the initialization, is never called by the FairScheduler class.

I activated the option "speculative" in the config file too (mapred-site...xml) but nothing.

So the question is: How can i cause/force the speculative execution?

Regards

2

There are 2 best solutions below

0
On

Speculative execution typically happens when there are multiple mappers running and one or more of them lag the others. A good way to get it to happen:

  • set up hive
  • set up a partitioned table
  • make sure the data is big enough to cause many mappers to run. This means: at least a few dozen HDFS blocks worth of data
  • enter data into the partitions: have one of the partitions with highly skewed data much more than the other partitions.
  • run a select * from the table

Now you may see speculative execution run.

If not, feel free to get back here. I can provide further suggestions (e.g. making some moderately complicated queries that would likely induce SE)

EDIT

Hive may be a bit of a stretch for you. But you can apply the "spirit" of the strategy to regular HDFS files as well. Write a map/reduce program with a custom partitioner that is intentionally skewed: i.e. it causes a single mapper to do an outsized proportion of the work.

Remember to have some tens of hdfs blocks (at least) to give the task trackers some decent amount of work to chew on.

0
On

You should be able to cause speculative execution using the two methods called setMapSpeculativeExecution(boolean) and setReduceSpeculativeExecution(boolean) that you can specify using Job, the MapReduce job configuration.