Hadoop Replication mechanism

53 Views Asked by At

In HDFS the block placement policy is that it places 1 block in the same rack as of the writer while the two other replicas on different nodes of a different rack.

But why doesn't it place 1 of the other 2 replicas on the same rack as the original block of data? wouldn't that be more optimized? as it wouldn't require too much bandwidth to write the other two blocks on the other rack?

1

There are 1 best solutions below

0
Alatau On

Data replication is performed as follows:

NameNode select new data nodes to host replicas the name server performs balancing of data placement by nodes and compiles a list of nodes for replication

The 1st replica is placed on the first node from the list

The 2nd replica is copied to another node in the same server rack

The 3rd replica is written to an arbitrary node in another server rack

the rest of the replicas are placed in an arbitrary way