I'm trying to create a cluster on GCP Dataproc with an external Hive metastore. I need that hive metastore writes data in GCS.
So i started to create a Mysql :
gcloud sql instances create hive-metastore-database-2 \
--tier db-n1-standard-1 \
--activation-policy=ALWAYS \
--region europe-west9
Then the cluster :
gcloud dataproc clusters create cluster-hive \
--region europe-west9 \
--worker-machine-type n2-standard-2 \
--scopes sql-admin \
--single-node \
--master-machine-type n2-standard-2 \
--master-boot-disk-size 60 \
--image-version 1.5-rocky8 \
--initialization-actions gs://goog-dataproc-initialization-actions-europe-west9/cloud-sql-proxy/cloud-sql-proxy.sh \
--properties hive:hive.metastore.warehouse.dir=gs://project-test-405020/test-hive-metastore \
--metadata "hive-metastore-instance=project-test-405020:europe-west9:hive-metastore-database-2"
Every thing is ok ... but when i run this Hive Job on the cluster : create database shared_gcs LOCATION 'gs://project-test-405020/'
I'm getting this error :
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: hadoopPath must not be null)
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
And a finaly found that if i change this parameter :
--image-version 1.5-rocky8 \ (spark 2.3)
To :
2.1-debian11 (spark 3)
The hive job runs successfuly
Any help to fix it to run on with 1.5-rocky8 ?
Thanks