webrequest via webhdfs with data node failover

Question

97 Views Asked by oula alshiekh At 01 July 2025 at 01:26

i have hadoop cluster with hadoop version apache 2.7.1

that is high available and consists of five nodes

mn1 ,mn2 ,dn1,dn2,dn3

if we are accesing wbhdfs from browser to open afile called myfile that has replication factor = 3 and exits on dn1,dn2 and dn3

we issue the following command from browser

http://mn1:50070/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN

so mn1 redirects this request as a result to dn1 or dn2 or dn3 and we get the file

and we can get the file too from hadoop by the following command

hdfs dfs -cat /hadoophome/myfile

but in the case of data nodes failure (suppose that dn1 and dn3 are down now)

if we issue the commnad

hdfs dfs -cat /hadoophome/myfile

we can retrieve the file

but if we issue webhdfs command from browser and this is my state

http://mn1:50070/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN

mn1 will redirect the request to dn1 or dn3 which are dead and some times it redirects the request to dn2 and i can retrieve the file

shouldn't mn1 redirect webhdfs request to alive data nodes only how to handle this error should it be handled from application ?

There are 1 best solutions below

**oula alshiekh** · Answer 1

edit hdfs-site.xml

<property>
    <name>dfs.namenode.heartbeat.recheck-interval</name>
    <value>10000</value>
</property>

where this property is measured in milliseconds

and you will get a timeout of 50 seconds

because default value for heartbeat.interval is 3 seconds

and Timeout to consider data node as dead

 2 * heartbeat.recheck.interval + 10 * heartbeat.interval

so timeout= 2 * (10 second) + 10 * 3 second = 50 second