Problem while creating DataProc Cluster: Component ranger failed to activate post-hdfs

837 Views Asked by At

I am working in a Dataproc cluster setup to test all the features. I have created a cluster template and I've been using the creation command almost every day, but this week it stopped working. The error printed is:

ERROR: (gcloud.dataproc.clusters.create) Operation [projects/cluster_name/regions/us-central1/operations/number] failed: Failed to initialize node cluster_name-m: Component ranger failed to activate post-hdfs See output in: gs://cluster_bucket/google-cloud-dataproc-metainfo/number/cluster_name-m/dataproc-post-hdfs-startup-script_output.

When I see the output that the message print, the error found is:

<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: ERROR: Error CREATEing SolrCore 'ranger_audits': Unable to create core [ranger_audits] Caused by: Java heap space
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]:
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + exit_code=1
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + [[ 1 -ne 0 ]]
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + echo 1
<13>Mar  3 12:22:59 google-dataproc-startup[15744]: <13>Mar  3 12:22:59 post-hdfs-activate-component-ranger[15786]: + log_and_fail ranger 'Component ranger failed to activate post-hdfs'

The creation command that I ran is:

gcloud dataproc clusters create cluster_name \
  --bucket cluster_bucket \
  --region us-central1 \
  --subnet subnet_dataproc \
  --zone us-central1-c \
  --master-machine-type n1-standard-8 \
  --master-boot-disk-size 500 \
  --num-workers 2 \
  --worker-machine-type n1-standard-8 \
  --worker-boot-disk-size 1000 \
  --image-version 2.0.29-debian10 \
  --optional-components ZEPPELIN,RANGER,SOLR \
  --autoscaling-policy=autoscale_rule \
  --properties="dataproc:ranger.kms.key.uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key,dataproc:ranger.admin.password.uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted,hive:hive.metastore.warehouse.dir=gs://cluster_bucket/user/hive/warehouse,dataproc:solr.gcs.path=gs://cluster_bucket/solr2,dataproc:ranger.cloud-sql.instance.connection.name=gcp-project:us-central1:ranger-metadata,dataproc:ranger.cloud-sql.root.password.uri=gs://cluster_bucket/ranger-root-mysql-password.encrypted" \
  --kerberos-root-principal-password-uri=gs://cluster_bucket/kerberos-root-principal-password.encrypted \
  --kerberos-kms-key=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key \
  --project gcp-project \
  --enable-component-gateway \
  --initialization-actions gs://goog-dataproc-initialization-actions-us-central1/cloud-sql-proxy/cloud-sql-proxy.sh,gs://cluster_bucket/hue.sh,gs://goog-dataproc-initialization-actions-us-central1/livy/livy.sh,gs://goog-dataproc-initialization-actions-us-central1/sqoop/sqoop.sh \
  --metadata livy-timeout-session='4h' \
  --metadata "hive-metastore-instance=gcp-project:us-central1:hive-metastore" \
  --metadata "kms-key-uri=projects/gcp-project/locations/global/keyRings/kbs-dataproc-keyring/cryptoKeys/kbs-dataproc-key" \
  --metadata "db-admin-password-uri=gs://cluster_bucket/hive-root-mysql-password.encrypted" \
  --metadata "db-hive-password-uri=gs://cluster_bucket/hive-mysql-password.encrypted" \
  --scopes=default,sql-admin

I know that it's something related to Ranger/Solr setup, but I don't know how to increase this heap size without creating an alternative initialization script or creating a custom machine image. If you have any idea of how to solve this or need more information about my setup please let me know.

2

There are 2 best solutions below

1
On

This is not required as this feature is enabled by default:

--initialization-actions gs://goog-dataproc-initialization-actions-us-central1/cloud-sql-proxy/cloud-sql-proxy.sh,gs://cluster_bucket/hue.sh,gs://goog-dataproc-initialization-actions-us-central1/livy/livy.sh,gs://goog-dataproc-initialization-actions-us-central1/sqoop/sqoop.sh \

In the cloud SQL for MySQL, enable the flag log_bin_trust_function_creators to ON. Setting this flag controls whether stored function creators can be trusted not to create stored functions that will cause unsafe events to be written to the binary log. You can reset the flag to OFF after cluster creation. Please try and let me know if you see issues.

Also, please share the error log for more details.

1
On

This could be a wrong character in a username in SSH configuration for the Dataproc Project, which the hadoop fs -mkdir -p command failed in the HDFS activation step during cluster creation.

You can follow these steps for a solution.

  1. Create the cluster using command line within GCP.
  2. Use the tag --metadata=block-project-ssh-keys=true \ below when creating a cluster, even if metadata tags exist for other purposes and still add this tag at the end.

You can see this example.

gcloud dataproc clusters create cluster_name \
  --metadata=block-project-ssh-keys=true \
  --bucket cluster_bucket \
  --region us-central1 \
  --subnet subnet_dataproc \
   ………..