Spark Read BigQuery External Table

3.8k Views Asked by At

Trying to Read a external table from BigQuery but gettint a error

    SCALA_VERSION="2.12"
    SPARK_VERSION="3.1.2"
    com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'

    table = 'data-lake.dataset.member'
    df = spark.read.format('bigquery').load(table)
    df.printSchema()

Result:

root
  |-- createdAtmetadata: date (nullable = true)
  |-- eventName: string (nullable = true)
  |-- producerName: string (nullable = true)

So when im print

df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()

i got this message error:

INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.

2

There are 2 best solutions below

0
On BEST ANSWER

As external tables are not supported in queries by spark, i tried the other way and got!

    def read_query_bigquery(project, query):
      df = spark.read.format('bigquery') \
      .option("parentProject", "{project}".format(project=project))\
      .option('query', query)\
      .option('viewsEnabled', 'true')\
      .load()
    
      return df
    
    project = 'data-lake'
    query = 'select * from data-lake.dataset.member'
    spark.conf.set("materializationDataset",'dataset')
    df = read_query_bigquery(project, query)
    df.show()
0
On

The bigquery connector uses the BigQuery Storage API to read the data. At the moment this API does not support external tables, this the connector doesn't support them as well.