distcp between nameservice1 and nameservice2

4k Views Asked by At

we have CDH 5.2 with Cloudera Manager 5.

We want to copy data from nameservice2 to nameservice1

Both clusters are on same CDH version

When I tried hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo

I got error

java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice2

So I added following config from Nameservice2 to Nameservice1

HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml in Cloudera manager (Gateway Default Group)

<property>
<name>dfs.nameservices</name>
<value>nameservices2</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>

But I am still getting same error.

Any workaround this ?

thanks

1

There are 1 best solutions below

2
On

In HA enabled HDFS namenode nameservice1,nameservice2 are logical names, you cannot use ports along with that logical name.

you have two methods.

Easy method is to find the active namenodes and use the active namenode:port in the distcp command as follows. Namenode web UI can be used for finding active namenodes of two clusters.

hadoop distcp hdfs://hnn001.prod.cc:8020:8020/foo/bar hdfs://<dest-cluster-active-nn-hostname>:8020/bar/foo

Another method is to use logical names of two clusters as follow, But before trying the below command make sure you have properly configured nameservice1 and nameservice2 in your client hdfs-site.xml.

hadoop distcp hdfs://nameservice2/foo/bar hdfs://nameservice1/bar/foo

Confiruting remote cluster's nameservice in local cluster.

Looks like nameservice2 is your local and nameservice1 is your remote. You need to keep the all associated properties of nameservice1 and nameservice2 in the local cluster ie. Your local cluster's client hdfs-site.xml files should be as follows.

<configuration>
<!-- Available nameservices -->
<property>
<name>dfs.nameservices</name>
<value>nameservices1,nameservices2</value>
</property>

<!-- Local nameservice2 properties -->
<property>
<name>dfs.client.failover.proxy.provider.nameservices2</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices2</name>
<value>namenode36,namenode405</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.cc:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode36</name>
<value>hnn001.prod.com:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices2.namenode405</name>
<value>hnn002.prod.com:50470</value>
</property>

<!-- Remote nameservice1 properties -->
<!-- You can find these properties in the remote machine's hdfs-site.xml file -->

<property>
<name>dfs.client.failover.proxy.provider.nameservices1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservices1</name>
<value>namenodeXX,namenodeYY</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenode**XX**</name>
<value><Remote-nn1>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeXX</name>
<value><Remote-nn1>:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:54321</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.nameservices1.namenodeYY</name>
<value><Remote-nn2>:50470</value>
</property>

<!-- Other properties --> 

</configuration>

In the above configuration files replace all place holders like YY XX with corresponding values in the remote machine's hdfs site.xml.