Spark Read BigQuery External Table

3.9k Views Asked by At

Trying to Read a external table from BigQuery but gettint a error

    SCALA_VERSION="2.12"
    SPARK_VERSION="3.1.2"
    com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'

    table = 'data-lake.dataset.member'
    df = spark.read.format('bigquery').load(table)
    df.printSchema()

Result:

root
  |-- createdAtmetadata: date (nullable = true)
  |-- eventName: string (nullable = true)
  |-- producerName: string (nullable = true)

So when im print

df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()

i got this message error:

INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.

2

There are 2 best solutions below

0
Pedro Rodrigues On BEST ANSWER

As external tables are not supported in queries by spark, i tried the other way and got!

    def read_query_bigquery(project, query):
      df = spark.read.format('bigquery') \
      .option("parentProject", "{project}".format(project=project))\
      .option('query', query)\
      .option('viewsEnabled', 'true')\
      .load()
    
      return df
    
    project = 'data-lake'
    query = 'select * from data-lake.dataset.member'
    spark.conf.set("materializationDataset",'dataset')
    df = read_query_bigquery(project, query)
    df.show()
0
David Rabinowitz On

The bigquery connector uses the BigQuery Storage API to read the data. At the moment this API does not support external tables, this the connector doesn't support them as well.