Great Expectations SQLAlchemy doesn't enclose lowercase column names

945 Views Asked by At

I am trying to use SQLAlchemy and Great Expectations for testing data quality of datasets stored in Snowflake DB. The dataset is called candidates and the column of interest is called first_name.

However, when I run

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="candidates", engine=engine, schema=creds["schema"])
sql_dataset.expect_column_values_to_be_in_set("first_name", ['Gather', 'Male'])

I am getting:

ProgrammingError: (snowflake.connector.errors.ProgrammingError) 002003 (42S02): SQL compilation error:
Object 'KEBOOLA_274.WORKSPACE_48777448.CANDIDATES' does not exist or not authorized.
[SQL: SELECT count(*) AS element_count, sum(CASE WHEN (first_name IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS null_count, sum(CASE WHEN (first_name NOT IN (%(first_name_1)s, %(first_name_2)s) AND first_name IS NOT NULL) THEN %(param_3)s ELSE %(param_4)s END) AS unexpected_count 
FROM "WORKSPACE_48777448".candidates]
[parameters: {'param_1': 1, 'param_2': 0, 'first_name_1': 'Gather', 'first_name_2': 'Male', 'param_3': 1, 'param_4': 0}]
(Background on this error at: http://sqlalche.me/e/13/f405)

The issue is the table name is not enclosed in double quotes, hence Snowflake is looking for CANDIDATES instead of candidates. The same issue would arise with the column name should the table be found.

I've tested

sql_dataset = great_expectations.dataset.SqlAlchemyDataset(table_name="\"candidates\"",
                                                           engine=engine,
                                                           schema=creds["schema"])

bu Snowflake is then looking for a table called KEBOOLA_274.WORKSPACE_48777448.""candidates"".

I know SQLAlchemy by default considers lowercase object names to be case insensitive, therefore it doesn't enclose the names. Is there any way around it please?

2

There are 2 best solutions below

0
On BEST ANSWER

I work for Superconductive as a developer of Great Expectations. I submitted a fix for this a few weeks back! You can now specify the use_quoted_name property in your batch_kwargs. If this is set to True, it will treat your table and column_names as case sensitive, so you should have no trouble accessing lowercase table names, though you will also need to make sure that the case of your other table and column names are appropriately specified.

0
On

As Mike Walton has suggested in the comments, the solution was to uppercase all table names before using the great_expectations package.