configuring dataproc with an external hive metastore

98 Views Asked by At

I'm trying to create a cluster on GCP Dataproc with an external Hive metastore. I need that hive metastore writes data in GCS.

So i started to create a Mysql :

gcloud sql instances create hive-metastore-database-2 \
--tier db-n1-standard-1 \
--activation-policy=ALWAYS \
--region europe-west9

Then the cluster :

gcloud dataproc clusters create cluster-hive \
    --region europe-west9 \
    --worker-machine-type n2-standard-2  \
    --scopes sql-admin \
    --single-node \
    --master-machine-type n2-standard-2 \
    --master-boot-disk-size 60 \
    --image-version 1.5-rocky8 \
    --initialization-actions gs://goog-dataproc-initialization-actions-europe-west9/cloud-sql-proxy/cloud-sql-proxy.sh \
    --properties hive:hive.metastore.warehouse.dir=gs://project-test-405020/test-hive-metastore \
    --metadata "hive-metastore-instance=project-test-405020:europe-west9:hive-metastore-database-2"

Every thing is ok ... but when i run this Hive Job on the cluster : create database shared_gcs LOCATION 'gs://project-test-405020/'

I'm getting this error :

     org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: hadoopPath must not be null)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

And a finaly found that if i change this parameter :

--image-version 1.5-rocky8 \ (spark 2.3)

To :

2.1-debian11 (spark 3)

The hive job runs successfuly

Any help to fix it to run on with 1.5-rocky8 ?

Thanks

0

There are 0 best solutions below