Great expectations framework - AWS Redshift connection

423 Views Asked by Sebastian Dengler At 27 July 2025 at 18:34

I'm trying to set up a connection to AWS Redshift from the Great Expectations Framework (GE) according to the tutorial using Python and facing two issues:

When I'm using postgresql+psycopg2 as driver in the connection string in step 5, adding the datasource (context.add_datasource(**datasource_config)) takes extremely long (up to 20 minutes !!!). Validating expectations afterwards works as expected and even runs quite fast. I'm assuming the huge amount of time needed is due to the size of the redshift cluster I'm connecting to (more than 1000 schemas) and the postgresql driver not being optimized for redshift.
In search for alternatives to the postgresql driver I came across the sqlalchemy-redshift driver. Changing it in the connection string (redshift+psycopg2) adds the datasource instantly, however, validating some expectations (e.g. expect_column_values_to_not_be_null) fails! After some digging through the code I realized it might be due to GE creating a temporary table in the SQL query. So when I specify the query:
```
select * from my_redshift_schema.my_table;
```
GE actually seems to run something like:
```
CREATE TEMPORARY TABLE "ge_temp_bf3cbfa2" AS select * from my_redshift_schema.my_table;
```
For certain expectations sqlalchemy-redshift tries to find information about the columns of the table, however, it searches for the name of the temporary table and not the actual one I specified in the SQL query. It consequently fails as it obviously can't find a table with that name in the redshift cluster. More specifically it results in a KeyError in the dialect.py file within sqlalchemy-redshift:
```
.venv/lib/python3.8/site-packages/sqlalchemy_redshift/dialect.py\", line 819, in _get_redshift_columns
    return all_schema_columns[key]
    KeyError: RelationKey(name='ge_temp_bf3cbfa2', schema='public')
```

Has anyone succeeded running GE on redshift? How could I mitigate the issues I'm facing (make option 1 faster or fix the error in option 2)?

Original Q&A

Great expectations framework - AWS Redshift connection

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SQLALCHEMY

Related Questions in AMAZON-REDSHIFT

Related Questions in GREAT-EXPECTATIONS

Trending Questions

Popular # Hahtags

Popular Questions