Is there any difference between HighAvailability (HA) for NameNode and HDFS?

159 Views Asked by At

I am getting confused between high availability of HDFS and name node, are these two things one and the same or different?

3

There are 3 best solutions below

0
On

More or less , when the NameNode is down (which is a single point of failure ) in a standard cluster all the HDFS cluster is going to be down, because basically no other role/node can replace its job . so When we say HDFS High Availability we say creating another standby NameNode to replace the active one once it's down .

So to answer you question , I may say yes ,you can call it 'The HDFS NameNode High Availability', 'HDFS HA' , 'NameNode HA' .. you're pointing to same thing "Making HDFS Cluster WORK when the NameNode host is down ".

0
On

HDFS is a distributed file system in Hadoop project. HDFS deals distributed storage i.e., store the data in term of blocks across the cluster nodes.

HDFS is master slave architecture. It has one or more masters i.e., NameNode(s) and one or more slave nodes i.e., DataNodes.

HDFS has two types of data:

  • Metadata - Managed by NameNode(s)
  • Data - Managed by DataNodes

In HDFS, metadata plays an important role for storage & retrieval of actual data. So the availability of NameNode is very important for the entire cluster health.

To make NameNode highly available, HDFS introduces HDFS High availability or NameNode availability

Note: Both HDFS HA and NameNode HA are same topics

0
On

HDFS High Availability provides the option of running two NameNodes in the same cluster, in an active/passive configuration.

My understanding is that both would refer to the same phenomenon.

You could get a better understanding by referring to the Cloudera documentation here.