Explanation of Cloudera architecture on cloud (Azure)

417 Views Asked by At

I am new to Hadoop/Cloudera world, I need to setup a Cloudera cluster on Microsoft Azure cloud. If I understood correctly there are two methods to install Cloudera on a cluster: using Cloudera Manager or thorugh a manual installation. According to this schema it seems it is needed a dedicated machine for Cloudera Manager and 3 Master Nodes.

enter image description here

But in this table it seems I can install Cloudera Manager directly on the Master Node.

enter image description here

So here are my doubts/questions:

  • 1) Is it necessary to have Cloudera Manager in a dedicated machine (if yes, why)? Or can it be installed directly on the master node?
  • 2) Why there are 3 master nodes? From what I understood, 2 master nodes can be used for high availability (they are the mirror of each other with the same configuration and services and can used for an hot switch). What is the purpose of the third master node and why it is different from the other two?
  • 3) What is the purpose of the Cloudera Director and which are the differences from the Cloudera Managera? I've read that it can be used for automated deployments to the cloud but it is not clear to me for what exactly I could use it.

Thanks in advance for any information.

1

There are 1 best solutions below

0
On BEST ANSWER

You can see from Cloudera documentation at https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_host_allocations.html that you can have a varying number of master nodes depending on your cluster size and high availability requirements:

  • for a small cluster with up to 10 worker nodes and without high availability, you can have just one master (not recommended for production)
  • for a small cluster with high availability, you can have two master nodes
  • a larger cluster (up to 200 worker nodes) can have three master nodes - note their example only runs two NameNode instances, as the aim is to spread the workload over more nodes rather than have majority voting for this role.
  • up to 1000 worker nodes with five masters.

Similarly, the utility host used for Cloudera Manager is used for all Utility and Edge roles in the first two cases above, and then more utility hosts are shown as the cluster size gets larger, with the Cloudera Manager in those cases being the only utility run on its host.

https://www.cloudera.com/products/product-components/cloudera-director.html describes Cloudera Director, which is a tool to help you run Hadoop clusters in public cloud (AWS/Azure/Google Cloud). Cloudera Director works with Cloudera Manager to provide centralised administration of cloud clusters. https://www.cloudera.com/documentation/director/2-2-x/topics/director_cdh_cluster_management.html is also a useful reference for the differences between Cloudera Director and Cloudera Manager.