Write PySpark dataframe to BigQuery "Numeric" datatype

1.1k Views Asked by At

For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string, and even encode it but it keeps on throwing the NullPointerException. Even after spending 5 to 6 hours, I'm unable to figure it out myself or on the internet that what is the issue here and what should be the exact pyspark dataframe column type for mapping it to BigQuery Numeric column type. Any help or direction would be of great help. Thanks in advance.

2

There are 2 best solutions below

0
On

For anyone who faces the same issue, you just have to cast the column to decimal type.

from pyspark.sql.types import DecimalType

subscriber_df_deu.withColumn('column', col('column').cast(DecimalType()))
0
On

This is due to the range of spark data frames has. It can accomodate only 10 digit number. In-order to correct this issue please cast the number to Long datatype.

IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from 
-2147483648 to 2147483647.

https://spark.apache.org/docs/latest/sql-ref-datatypes.html

Hope this helps.