Why Apache Spark does some checks and raises those exceptions during the job runtime, but has never thrown them during Unit test?

86 Views Asked by Eljah At 13 April 2025 at 02:22

There was a bug in my Scala code, formatting the date of the timestamp, being then concatenated as the String to some, non-timestamp column of the Spark Streaming:

concat(date_format(col("timestamp"),"yyyy-MM-DD'T'HH:mm:ss.SSS'Z'")

So, during the tests, everything was ok and tests, sending the messages to the Kafka, were passed, and I was able to see those messages in the Kafka Tool:

Not 292th of October there because of DD instead of dd in the formatter.

But then in the executor it was some extra check that wasn't passed and job was crashed:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 8.0 failed 1 times, most recent failure: Lost task 1.0 in stage 8.0 (TID 12, kafkadatageneratorjob-driver, executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to format it to '2021-10-292T14:27:12.577Z' in the new formatter. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

How to enable the same strict check on the Unit tests to make them also failing on those checks without explicit check of the value, but just forcing timeParserPolicy also to be executed in tests also.

Original Q&A

Why Apache Spark does some checks and raises those exceptions during the job runtime, but has never thrown them during Unit test?

There are 0 best solutions below

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-STRUCTURED-STREAMING

Related Questions in APACHE-SPARK-3.0

Trending Questions

Popular # Hahtags

Popular Questions