Spark Not Null constrains in combination with badrecordspath for reading (delta) tables

17 Views Asked by At

i want to read data from a delta table (signals) with following strucutre:

StructType(
[
    StructField("timestamp", TimestampType(), **True**),
    StructField("name", StringType(), False),
    StructField("value", DoubleType(), True)
]

)

Note that the timestmap is allowed to be NULL.

One of multiple Jobs is reading this data with:

spark.readStream.option("badRecordsPath", "path_to_bad_records")
        .schema(expected_schema)
        .table(signals)

While the expecting schema is following:

StructType(
[
    StructField("timestamp", TimestampType(), **False**),
    StructField("name", StringType(), False),
    StructField("value", DoubleType(), True)
]

)

Note that i do not want to read in any recors with timestmap == NULL even though i know and expect the source to have some of these recors. Ideally i want to save the "malformed" ( records with NULL) to a badRecordsPath to a log file on this path.

I try to answer following questions: What happens to records not following the expected schema ? If they are ignored will the be written to the badrecordspath ? As i can only find examples for files (json, csv)

0

There are 0 best solutions below