Wrong IP mapping on some data nodes in hadoop

162 Views Asked by At

I have a hadoop setup on 7 nodes configured using local domains using /etc/hosts. It looks like this

1.2.3.4 hadoop-master
1.2.3.5 hadoop-slave-1
1.2.3.6 hadoop-slave-2
1.2.3.7 hadoop-slave-3
1.2.3.8 hadoop-slave-4
1.2.3.9 hadoop-slave-5
1.2.3.10 hadoop-slave-6

Now the problem is, on some nodes, there is wrong mapping for hadoop-slave-1, that is, some nodes have hadoop-slave-1 mapped to 1.2.3.12 instead of 1.2.3.4. Namenode has correct mapping, so data nodes show up fine in the the namenode UI.

Question is, will it be good to just change the /etc/hosts file and start the services? I think it can corrupt some specific blocks related to the hadoop-slave-1 node.

I can think of 2 ways to fix this:

  1. Fix the /etc/hosts file in the corrupt nodes and restart the service. But I am not sure if this could corrupt blocks. Is this assumption accurate?

  2. We can remove this single server hadoop-slave-1 from the cluster temporarily and re-balance the Hadoop cluster to distribute whole data between the remaining 6 nodes and then again add the server back into the cluster and re-balance the data to 7 nodes. But the problem with this is, data contained in the cluster is pretty big and could create a problem and also re-balancing the data will be heavy job and would create pressure on name node server and could cause heap issue.

Are there any other solution in this situation? Also, which tool or utility you suggest for replicating data to another hadoop cluster?

Help much appreciated!!

1

There are 1 best solutions below

0
On

In general, using /etc/hosts is discouraged if you have an functional DNS server (which most routers are).

For example, in my environment, I can ping namenode.lan


I think option 2 is the safest choice. hdfs -rebalancer works fine.

and could cause heap issue

Then stop the namenode, increase the heap, and start it back up. While you're at it, setup NameNode HA so you have no downtime.


Note: master/slave hostnames are really not descriptive. Each of HDFS and YARN and Hive and HBase and Spark all have server-client architectures with master services, and they should not be located on one machine.