Can I upload a Python Pandas DataFrame to the WRDS cloud to use in a raw_sql Statement?

280 Views Asked by At

I am using the wrds module in Python to extract data from WRDS. In SAS, I can easily upload a local file to the WRDS server to use in a SQL statement on the WRDS server. The code for that is:

proc upload data=mydatafile out=mydatafile; run;

Is there something similar in Python? I have a Pandas DataFrame called send_to_crsp containing tickers and dates that I'd like to use in the raw_sql statement below:

import pandas as pd
import wrds

db = wrds.Connection()

rets = db.raw_sql('''
                    select a.TIC, a.DATE_FILED0, a.DATE_FILED5, b.date, b.ret
                    from send_to_crsp as a, crsp.dsf as b
                    where a.TIC=b.tic and b.date >= a.DATE_FILED and b.date <= a.DATE_FILED5
                  ''',
                  date_cols = ['DATE_FILED0','DATE_FILED5','date'])
1

There are 1 best solutions below

0
On

WRDS users do not have CREATE TABLE or even CREATE TEMPORARY TABLE privileges, so there are no options to get data into a table on the WRDS PostgreSQL server.

To handle this issue on R (see StackOverflow question here), I basically translate a local data frame into an SQL statement of the following form:

SELECT * FROM 
    (VALUES (10000, '2012-01-03'), (10001, '2013-02-04')) AS t (permno, date))

Using dplyr it is easy to make this into a "table" that can be joined with crsp.dsf. See dbplyr::copy_inline() for a function that does all this (and see here for illustrations of the function's use).

Unfortunately, support and documentation for these kinds of data analysis workflows seems much weaker for Python. Perhaps something similar could be made using SQLAlchemy.