For simplicity, I've a table in BigQuery with one field of type "Numeric". When I try to write a PySpark dataframe, with one column, to BigQuery it keeps on raising the NullPointerException. I tried converting pyspark column into int, float, string, and even encode it but it keeps on throwing the NullPointerException. Even after spending 5 to 6 hours, I'm unable to figure it out myself or on the internet that what is the issue here and what should be the exact pyspark dataframe column type for mapping it to BigQuery Numeric column type. Any help or direction would be of great help. Thanks in advance.
Write PySpark dataframe to BigQuery "Numeric" datatype
1.1k Views Asked by Malina Dale At
2
There are 2 best solutions below
0
Piyush Namra
On
This is due to the range of spark data frames has. It can accomodate only 10 digit number. In-order to correct this issue please cast the number to Long datatype.
IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from
-2147483648 to 2147483647.
https://spark.apache.org/docs/latest/sql-ref-datatypes.html
Hope this helps.
Related Questions in GOOGLE-CLOUD-PLATFORM
- Why do I need to wait to reaccess to Firestore database even though it has already done before?
- Unable to call datastore using GCP service account key json
- Troubleshooting Airflow Task Failures: Slack Notification Timeout
- GoogleCloud Error: Not Found The requested URL was not found on this server
- Kubernetes cluster on GCE connection refused error
- Best way to upload images to Google Cloud Storage?
- Permission 'storage.buckets.get' denied on resource (or it may not exist)
- Google Datastream errors on larger MySQL tables
- Can anyone explain the output of apache-beam streaming pipeline with Fixed Window of 60 seconds?
- Parametrizing backend in terraform on gcp
- Nonsense error using a Python Google Cloud Function
- Unable to deploy to GAE from Github Actions
- Assigned A record for Subdomain in Cloud DNS to Compute Engine VM instance but not propagated/resolved yet
- Task failure in DataprocCreateClusterOperator when i add metadata
- How can I get the long running operation with google.api_core.operations_v1.AbstractOperationsClient
Related Questions in PYSPARK
- Troubleshoot .readStream function not working in kafka-spark streaming (pyspark in colab notebook)
- ingesting high volume small size files in azure databricks
- Spark load all partions at once
- Tensorflow Graph Execution Permission Denied Error
- How to overwrite a single partition in Snowflake when using Spark connector
- includeExistingFiles: false does not work in Databricks Autoloader
- I want to monitor a job triggered through emrserverlessstartjoboperator. If the job is either is success or failed, want to rerun the job in airflow
- Iteratively output (print to screen) pyspark dataframes via .toPandas()
- Databricks can't find a csv file inside a wheel I installed when running from a Databricks Notebook
- Graphframes Pyspark route compaction
- Add unique id to rows in batches in Pyspark dataframe
- PyDeequ Integration with PySpark: Error 'JavaPackage' object is not callable
- Is there a way to import Redshift Connection in PySpark AWS Glue Job?
- Filter 30 unique product ids based on score and rank using databricks pyspark
- Apache Airflow sparksubmit
Related Questions in GOOGLE-BIGQUERY
- SQL LAG() function returning 0 for every row despite available previous rows
- Convert C# DateTime.Ticks to Bigquery DateTime Format
- SELECT AS STRUCT/VALUES
- Google Datastream errors on larger MySQL tables
- Can i add new label called looker-context-look_id in BigQuery connection(Looker)
- BigQuery external table using JSON files
- Does Apache Beam's BigQuery IO Support JSON Datatype Fields for Streaming Inserts?
- sample query for review for improvement on big query
- How does Big Query differentiate between a day and month when we upload any CSV or text file?
- How to get max value of a column when ids are unique but they are related through different variables
- how to do a filter from a table where 2 different columns has 2 different records which has same set of key combinations in bigquery?
- How to return a string that has a special character - BigQuery
- How do I merge multiple tables into a new table in BigQuery?
- Customer Churn Calculation
- Is it correct to add "UNNEST" in the "ON" condition of a (left) join?
Related Questions in APACHE-SPARK-SQL
- Spark load all partions at once
- Joining 2 pyspark dataframes and continuing a running window sum and max
- Understanding least common type in databricks
- Insert selective columns into pyspark dataframe
- Dataframe won't save as anything - table, global temp view or temp view
- Spark TBLPROPERTIES to sql query?
- How to groupBy on two columns and work out avg total value for each grouped column using pyspark
- Spark SQL repartition before insert operation
- Convert 3 letter month column into a month number in Databricks SQL
- Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 - When reading table in SQL
- How to sort a PySpark dataframe rows by the order of a list?
- How to read csv files in dbfs using Spark SQL only?
- Handle different date formats in Pyspark
- Insert Overwrite partition data using Spark SQL on MINIO table
- update value in specific row by checking condition for another column values in pyspark
Related Questions in PYSPARK-SCHEMA
- Reading a parquet file using parquet-tool converting string values to scientific notation
- Is there a painless way to migrate existing Parquet files to a new schema? I wish to update an AWS Glue table column data type
- pyspark structural streaming one column dynamic json save to elasticsearch as json object
- pyspark error while writing data to existing hive table
- ValueError: Unable to parse datatype from schema. Could not parse datatype: interval year
- pyspark - how to add a new element to ArrayType column
- Need to add headers in existing data frame
- o1566.showString error persists even with os env variables defined with PySpark in jupyter notebook
- can we merge two parquet files into one which has different datatypes to the columns using the pyspark
- check if rows are already present in pyspark dataset
- PySpark DataFrame creation is throwing PySparkTypeError
- PySpark: Get Number of Columns from DataSchema
- How to get a timestamp data type column without the seconds in Pyspark?
- Pyspark calculate new rows based on previous rows from current and other multiple columns
- pyspark - making a new column lookup_l that contains a list and its elements are values from other columns from same df from current row
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
For anyone who faces the same issue, you just have to cast the column to decimal type.