There was a bug in my Scala code, formatting the date of the timestamp, being then concatenated as the String to some, non-timestamp column of the Spark Streaming:
concat(date_format(col("timestamp"),"yyyy-MM-DD'T'HH:mm:ss.SSS'Z'")
So, during the tests, everything was ok and tests, sending the messages to the Kafka, were passed, and I was able to see those messages in the Kafka Tool:
Not 292th of October there because of DD
instead of dd
in the formatter.
But then in the executor it was some extra check that wasn't passed and job was crashed:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 8.0 failed 1 times, most recent failure: Lost task 1.0 in stage 8.0 (TID 12, kafkadatageneratorjob-driver, executor driver): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to format it to '2021-10-292T14:27:12.577Z' in the new formatter. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
How to enable the same strict check on the Unit tests to make them also failing on those checks without explicit check of the value, but just forcing timeParserPolicy also to be executed in tests also.