I am trying to use SQLAlchemy and Great Expectations for testing data quality of datasets stored in Snowflake DB. The dataset is called candidates
and the column of interest is called first_name
.
However, when I run
sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="candidates", engine=engine, schema=creds["schema"])
sql_dataset.expect_column_values_to_be_in_set("first_name", ['Gather', 'Male'])
I am getting:
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (42S02): SQL compilation error:
Object 'KEBOOLA_274.WORKSPACE_48777448.CANDIDATES' does not exist or not authorized.
[SQL: SELECT count(*) AS element_count, sum(CASE WHEN (first_name IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS null_count, sum(CASE WHEN (first_name NOT IN (%(first_name_1)s, %(first_name_2)s) AND first_name IS NOT NULL) THEN %(param_3)s ELSE %(param_4)s END) AS unexpected_count
FROM "WORKSPACE_48777448".candidates]
[parameters: {'param_1': 1, 'param_2': 0, 'first_name_1': 'Gather', 'first_name_2': 'Male', 'param_3': 1, 'param_4': 0}]
(Background on this error at: http://sqlalche.me/e/13/f405)
The issue is the table name is not enclosed in double quotes, hence Snowflake is looking for CANDIDATES
instead of candidates
. The same issue would arise with the column name should the table be found.
I've tested
sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="\"candidates\"",
engine=engine,
schema=creds["schema"])
bu Snowflake is then looking for a table called KEBOOLA_274.WORKSPACE_48777448.""candidates""
.
I know SQLAlchemy by default considers lowercase object names to be case insensitive, therefore it doesn't enclose the names. Is there any way around it please?
I work for Superconductive as a developer of Great Expectations. I submitted a fix for this a few weeks back! You can now specify the
use_quoted_name
property in yourbatch_kwargs.
If this is set to True, it will treat your table and column_names as case sensitive, so you should have no trouble accessing lowercase table names, though you will also need to make sure that the case of your other table and column names are appropriately specified.