I'm testing using Great Expectation to get invalid records when they violate the defined rules. From the documentation it says we can specify include_unexpected_rows or return_unexpected_index_query in the result format. However, none of them work for me. I'm applying the expectation on spark data frame, below is my code:
import great_expectations as ge
from great_expectations.dataset.sparkdf_dataset import SparkDFDataset
df = spark.read.table("data_quality_test")
df_ge = SparkDFDataset(df)
result_format={
"result_format": "COMPLETE",
"include_unexpected_rows": True
}
result = df_ge.expect_column_values_to_be_in_type_list("page_title", ["DateType"], result_format=result_format)
print(result)
Could anyone please help in figuring out what could be the problem?
I think there are two things going in in your example:
expect_column_values_to_be_in_type_listin spark will just check the type of the whole column.expect_column_values_to_be_in_setso it will check row-wise):