Connecting BigQuery to Dataproc Metastore/hive tables

372 Views Asked by At

Is it possible to connect bigquery to hive/dataproce metastore database? I don't want to load hive tables(orc or parquet) into bigquery internal storage. If bigquery can route its sql to hive and then hive runs query on spark that works. I considered using Hive CLI instead of bigquery to execute queries but being able to do it via bigquery will allow unified interface to execute ad-hoc sqls. I also considered external tables in big query which can directly points to raw parquet/orc locations. However orc tables are also ACID tables managed by hive so bigquery directly accessing raw Orc dataset may result in inconsistent reads.

2

There are 2 best solutions below

1
On
It is possible to connect Hive/Dataproc to BigQuery or vice versa by using Spark BigQuery Connector. Take note that SparkSQL supports Hive and not BigQuery even though BigQuery reads and writes using spark-bigquery-connector.

0
On

I was able to achieve this by using Biglake metastore catalog. I stumble upon this document as I was also looking to expose apache iceberg external tables to bigquery. It seems you can use same catalog to export Hive (or dataproc metastore) tables to bigquery as well.