How to write a dataframe column with json data (STRING type) to BigQuery table as JSON type using pyspark?

569 Views Asked by majain At 28 June 2025 at 10:50

I have a pyspark dataframe with a column containing json string (column type is string). I would like to write this dataframe to Bigquery table with column type as JSON. I got below information from this link https://github.com/GoogleCloudDataproc/spark-bigquery-connector

Spark has no JSON type. The values are read as String. In order to write JSON back to BigQuery, the following conditions are REQUIRED:

Use the INDIRECT write method
Use the AVRO intermediate format
The DataFrame field MUST be of type String and has an entry of sqlType=JSON in its metadata

I am not sure how to set an entry of sqlType=JSON in dataframe field metadata? Can someone please help?

I am using below code to write dataframe to Bigquery table

df.write \
  .format("bigquery") \
  .option("temporaryGcsBucket","some-bucket") \
  .save("dataset.table")

Original Q&A

There are 1 best solutions below

A Johnson On 03 February 2023 at 10:02

Have look at the withMetadata column.

pyspark.sql.DataFrame.withMetadata

df_meta = df.withMetadata('age', {'foo': 'bar'})
df_meta.schema['age'].metadata
{'foo': 'bar'}

How to write a dataframe column with json data (STRING type) to BigQuery table as JSON type using pyspark?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in GOOGLE-BIGQUERY

Related Questions in SPARK-BIGQUERY-CONNECTOR

Trending Questions

Popular # Hahtags

Popular Questions