I am very new to SQL so my apologies if this is an easy question, I didn't find anything while searching but I may have missed obvious search terms.
I am trying to download all transaction data for a set of municipal bonds for which I have a list of CUSIPs, currently being stored as a .txt file with one CUSIP per line. The online version of WRDS allows a user to upload such a .txt file to retrieve their data.
I would like to automate this process in R and followed the WRDS guide for setting up SQL queries in R. Ultimately I will use something along the lines of
res <- dbSendQuery(wrds, "select *
from msrb.msrb
where cusip IN ???")
data <- dbFetch(res, n=-1)
dbClearResult(res)
data
How do I actually get my list of CUSIPs into the query? It is too long for it to be practicable to directly list each CUSIP. Can I reference a .txt file somehow, or at least a character vector in R or something? Is there a better approach?
I think there are two efficient ways to programmatically do
IN (...)
in SQL, and one way that is popular but risky (and I generally discourage it).Using parameter binding. This is practical up to some subjective limit; there might be a real limit on how many parameters
DBI
allows to be bound, but I don't know it; I don't know if SQL implementations often limit the number of values you can put in a literalIN (...)
statement (I just tested PG11 with 5000, no problem). At some point, it might more efficient or desirable to use option 2 below. However, if we're talking on the order of so many dozen, then try this.The use of
($1, $2, $3)
is specific to postgres; other DBMSes may use different nomenclature, including(?,?,?)
(sql server and others).Upload the ids into a temporary table and query against it. (This can also be partially used if you acquire the ids to use from another query, just update the inner SQL to reflect your other query.)
or join against the temp table, with
In general, I am a staunch advocate for using parameter binding, as it will side-step any form of SQL injection, whether malicious or accidental. If you are in a rush, however, you can form the
IN (...)
yourself. You can useglue::glue_sql
to make sure the correct quotes (for your specific DBMS) are always used; if not, it is often safe to use single quotes.Note that
glue::glue_sql
provides the*
notation. From?glue::glue_sql
:For all three methods, I used the more-direct
DBI::dbGetQuery
, but you can still use theDBI::dbSendQuery
/DBI::dbFetch
two-step if you prefer.Depending on the size of your
msrb
table as well as its indices, these queries might not hit all of the optimizations. If that's the case, consider adding to the query based on advice from your DBA.