Cloudera Cluster Deployment bootstrap failed error on EC2 using Director

219 Views Asked by At

I have successfully deployed Cloudera Director and Cloudera Manager on EC2. I can access both director and manager instances from browser and can do ssh to those instances. Cloudera Manager server and agent is running fine and checked it by running following commands.

Next step is - I want to deploy cluster. However its failing and I am getting Bootstrap failed error. I checked application.log file on director and found exception as - Caused by:

java.net.ConnectException: ConnectException invoking http://:7180/api/v6/commands/158: Connection refused (Connection refused)

After checking services status, I found that during deployment of cluster, somehow cloudera-scm-manager service is getting stopped (cloudera-scm-server). However, before deployment of cluster, I had verified that cloudera-scm-server service was up and running.

I tried deploying cluster number of times and using both t2 small and m4 large instance types. I am getting same exception.

After getting error, if I restart cloudera-scm-service, it starts, and works fine. But during cluster deployment it gets stops automatically which I guess fails the cluster deployment. Not sure how and why?

Any idea what could be the issue? Can someone provide any pointers/help to resolve this issue?

Version details used for deployment as follows -

  • Cloudera Director version - 2.4.1
  • Cloudera Manager version - 5.11.1
  • EC2 instance - tried with both t2 small and m4 large instance type.
  • EC2 instance OS - RHEL 6.7, 64-bit
  • Cluster config selected - 1 master,1 worker,1 gateway
  • Cluster services selected - Core Hadoop with Spark on YARN (this includes following services - HDFS, Hive, Hue, Oozie, Spark on YARN, YARN, ZooKeeper)

Any help/input/pointers to solve this issue, greatly appreciated.

Thanks so much in advance.

-picku

1

There are 1 best solutions below

1
On BEST ANSWER

Picku

My first guess based on your symptoms is that your CM instance is too small. Linux has an OOM Killer that will terminate arbitrary processes if there is not enough memory for the OS to run. This is likely the reason that you don't see the cloudera-scm-service as running. I believe you can look in /var/log/messages to find the "smoking gun" that implicates the OOM Killer.

Please refer to the Cloudera Enterprise Reference Architecture for AWS Deployments for recommendations on instance types. http://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_aws.pdf

Good Luck! David