Python Version : 3.8.10 --> Upgraded from Python 3.6
Numpy version : 1.21.5 --> upgraded from Numpy 1.19.5
great-expectation : 0.16.16 --> upgraded from great-expectation 0.13.11
While validating the dataframe with great-expectation getting the exception for column (NumMonths) actual datatype and expected datatype
DataFrame
data = {'Ids': [f'A{i}' for i in range(1, 11)],
'NumMonths': [3, 6, 9, 12, 3, 6, 9, 12, 3, 6]}
df = pd.DataFrame(data)
df['NumMonths'] = df['NumMonths']
Great-Expectations
{
"data_asset_type": "Dataset",
"expectation_suite_name": "testsuite",
"expectations": [
{
"expectation_type": "expect_table_columns_to_match_set",
"kwargs": {
"column_set": [
"Ids",
"NumMonths"
],
"exact_match": true
},
"meta": {
"severity": "critical"
}
},
{
"expectation_type": "expect_select_column_values_to_be_unique_within_record",
"kwargs": {
"column_list": [
"Ids"
]
},
"meta": {
"severity": "critical"
}
},
{
"expectation_type": "expect_column_values_to_be_in_type_list",
"kwargs": {
"column": "Ids",
"type_list": [
"str"
]
},
"meta": {
"severity": "critical"
}
},
{
"expectation_type": "expect_column_values_to_be_in_type_list",
"kwargs": {
"column": "NumMonths",
"type_list": [
"Int32",
"Int64"
]
},
"meta": {
"severity": "critical"
}
},
{
"expectation_type": "expect_column_values_to_be_between",
"kwargs": {
"column": "NumMonths",
"min_value": 3,
"max_value": 12
},
"meta": {
"severity": "warning"
}
},
{
"expectation_type": "expect_column_values_to_not_be_null",
"kwargs": {
"column": "Ids",
"mostly": 1
},
"meta": {
"severity": "critical"
}
},
{
"expectation_type": "expect_column_values_to_not_be_null",
"kwargs": {
"column": "NumMonths",
"mostly": 1
},
"meta": {
"severity": "critical"
}
}
],
"meta": {
"great_expectations_version": "0.16.16"
}
}
The above code was working fine with older versions of python, numpy and great-expectations.
Answers*:
While Investigating I found the Numpy 1.20.0, there is a change in the dtypes. https://numpy.org/doc/stable/release/1.20.0-notes.html
I also deep dived into the great-expectations github repository, they have written the work around for numpy dtype change. https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py#L319
I used the python astype(), convert_dtypes() to change the datatype, it changes the dtype to Int64, as soon it goes into validate() of great expectations it consider it as int64 https://github.com/great-expectations/great_expectations/blob/develop/great_expectations/expectations/core/expect_column_values_to_be_in_type_list.py#L269
The above code works if I change the Numpy version to 1.19.5
