Input Splits in Hadoop

200 Views Asked by Harshi At 11 February 2016 at 07:00

If the input file size is 200MB, there will be 4 blocks/ input splits, but each data node will have a mapper running on it. If all the 4 input splits are in the same data node, then only one map task will be executed?
or how does the number of map task depend on the input split?
Also will the Task Tracker run on all the data nodes and Job Tracker on one data node in the cluster?

Original Q&A

There are 1 best solutions below

Mrinal On 19 August 2016 at 15:24

Number of maps entirely depends on no of splits, not on the location of the blocks/splits. So for your case it will be 4. As your are saying all in one node, you also have to consider that there will be replicas of those blocks in different nodes. Now there is concept of map-reduce processing, 'data locality' which hadoop will want to take advantage of. And another thing to consider here is avaiablity of resources. So for a block (a replica of all, commonly 3) to be executed hadoop will find a datanode in which the block is present and resource is available. So it may go up to a situation like you described, replicas of the 4 blocks are present in one of the nodes and it has resources that map-reduce will need. But map task will be 4, that is for sure.

Input Splits in Hadoop

There are 1 best solutions below

Related Questions in HADOOP

Related Questions in INPUT-SPLIT

Trending Questions

Popular # Hahtags

Popular Questions