Great Expectations expect_column_values_to_not_be_null method does not fail values with nan

122 Views Asked by At

I just started working with GreatExpectations with PySpark so please bear with me in case if I have done something wrong. What I have done so far is to read an excel file and apply a couple of GreatExpectations quality checks on them.

Col 1 Col2
val1
val2

I'm using the following snippet to read and apply the expect_column_values_to_not_be_null method:

from pyspark.sql import SparkSession
import pandas as pd
from great_expectations.dataset import SparkDFDataset

# Create a SparkSession
spark = SparkSession.builder.appName("ReadExcel").getOrCreate()

# Read Excel file into a Pandas dataframe
df_pandas = pd.read_excel("sampledata.xlsx", sheet_name='sheet')

# Convert Pandas dataframe to Spark dataframe
df_spark = spark.createDataFrame(df_pandas)

dfForSparkFromGe = SparkDFDataset(df_spark)

mandatory_cols = [
    "Col1",
    "Col2",
]

def check_not_null_for_mandatory_cols(cols):
for col in cols:
    try:
        check = dfForSparkFromGe.expect_column_values_to_not_be_null(col)
        if check.success:
            print(f"no null values found for {col}")
        else:
            raise Exception(
                f"{check.result['unexpected_count']} of {check.result['element_count']} are null for {col}: FAILED")
    except AssertionError as e:
        print(e)

So from this I'm expecting the program to throw an exception as it doesn't have values for Col1 and 2 in the sample dataset. Even when I see the dfForSparkFromGe dataframe, I can see the value as NaN, so ideally it should throw an exception based on the docs.

Am I missing something here?

0

There are 0 best solutions below