I am using Apache Hadoop-2.7.1 on cluster that consists of three nodes
nn1 master name node
nn2 (second name node)
dn1 (data node)
we know that if we configure high availability in this cluster
we will have two main nodes, one is active and another is standby
and if we configure the cluster to be called by name service too the following scenario will be ok
the scenario is:
1- nn1 is active and nn2 is stand by
so if we want to get file(called myfile) from dn1 we can send this url from browser (webhdfs request)
http://nn1/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN
2- name node daemon in nn1 is killed so according to high availability nn1 is standby and nn2 is active so we can get myfile now by sending this web request to nn2 because it is active now
http://nn2/webhdfs/v1/hadoophome/myfile/?user.name=root&op=OPEN
so configuring name service with high availability is enough for name node failure and for webhdfs to work fine then
so what is the benefit of adding httpfs here because webhdfs with high availibility is not supported and we have to configure httpfs
I understand that this is a follow up of your previous question here.
WebHDFSandHttpFsare two different things. WebHDFS is part of the Namenode and it is the NN that handles theWebHDFSAPI calls whereas HttpFs is a separate service independent of the Namenodes and theHttpFsserver handles the API calls.Your REST API calls will remain the same irrespective of which NN is in Active state.
HttpFs, being HA aware, will direct the request to the current Active NN.Let us assume
HttpFsserver is started innn1.WebHDFS
GETrequestThis is served by the Namenode daemon running in
nn1.Scenario 1:
nn1is Active. The request will be rewarded with a valid response.Scenario 2:
nn2is Active. Making the same request will fail as there is no Active NN running innn1.So, the REST call must be modified to request the
nn2Now, this will be served by the NN daemon running in
nn2.HttpFs
GETrequestThis request is served by the
HttpFsservice running innn1.Scenario 1:
nn1is Active.HttpFsserver running innn1will direct the request to the current Active Namenodenn1.Scenario 2:
nn2is Active.HttpFsserver running innn1will direct the request to the current Active Namenodenn2.In both scenario, the REST call is same. The request will fail if the
HttpFsserver is down.nameserviceis the logical name given to the pair of Namenodes. Thisnameserviceis not an actual Host and cannot be replaced with the Host parameter in the REST API calls.