Why can't we calculate job execution time in Hadoop?

2.4k Views Asked by At

My question is related to Straggler problem. In sort, it's an algorithm and we can know its complexity and calculate the running time when executed on a constant set of data.

Why can't we acquire job execution time in Hadoop ?

If we can acquire the job execution time or task execution time, we can know the straggler tasks quickly without needing algorithms to know which task is Straggler.

2

There are 2 best solutions below

1
On BEST ANSWER

You should not estimate how much time a job will take before running that job. After running your mapreduce job, you can take an estimation of the time taken. Mapreduce always depends on your cluster capacity – RAM size, CPU Cores and network band width – and how many Reducers you set for the task.

You can only make assumptions based on your RAM size divided by the input split.

2
On

The job execution time or the task execution time will be available in the job tracker web UI.Hope that is what you are looking for.the web UI will be availlable in 50030 port of your job tracker.If its a Yarn based setup the url would be http://:8088