I have a pyspark dataframe with a column containing json string (column type is string). I would like to write this dataframe to Bigquery table with column type as JSON. I got below information from this link https://github.com/GoogleCloudDataproc/spark-bigquery-connector
Spark has no JSON type. The values are read as String. In order to write JSON back to BigQuery, the following conditions are REQUIRED:
- Use the INDIRECT write method
- Use the AVRO intermediate format
- The DataFrame field MUST be of type String and has an entry of sqlType=JSON in its metadata
I am not sure how to set an entry of sqlType=JSON in dataframe field metadata? Can someone please help?
I am using below code to write dataframe to Bigquery table
df.write \
.format("bigquery") \
.option("temporaryGcsBucket","some-bucket") \
.save("dataset.table")
Have look at the withMetadata column.
pyspark.sql.DataFrame.withMetadata