Copy Redshift table from S3 csv file using Python?

2.5k Views Asked by At

What is the recommended module and syntax to programatically copy from an S3 csv file to a Redshift table? I've been trying with the psycopg2 module, but without success (see psycopg2 copy_expert() - how to copy in a gzipped csv file?). I've tried cur.execute(), cur.copy_expert() and cur.copy_from() - all unsuccessfully. My experience and comments I've read lead me to conclude that psycopg2, while sufficient for python-programming a postgres DB, will not work for Redshift tables for some reason. So what is the workaround if I want a Python script to do this copy?

Here is the COPY statement I want to run. The source is a gzipped csv file with a pipe delimiter. This works fine from a SQL interface like DBeaver, but I can't figure out how it would translate to Python:

'''COPY <destination_table> from 's3://bucket/my_source_file.csv.gz' CREDENTIALS <my credentials>  delimiter '|' IGNOREHEADER 1 ENCODING UTF8 IGNOREBLANK    LINES NULL AS 'NULL' EMPTYASNULL BLANKSASNULL gzip ACCEPTINVCHARS timeformat 'auto' dateformat 'auto' MAXERROR 100 compupdate on;'''
2

There are 2 best solutions below

0
On

I use ODBC using the pyODBC library successfully. Just call .execute(copy-command) and you shouldn't have an issue.

0
On

There are plenty of examples online of connecting to Amazon Redshift from Python. For example:

They typically look like:

conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("COPY...")
conn.commit()