I am trying to load a table across several SQL Teradata tables that are "joined together" using a RODBC connection. There are unique IDs that I want to specify so it only pulls their data. The SQL statement currently works with the "sqlQuery" R function when I insert a few of the IDs in the query text. However, I have many more IDs (e.g., 10,000's, 100,000's) that I want to pull the data for. I only have these IDs stored within my local R environment, so I cannot join by it from a SQL table. My question is how can I pull the data for these IDs by referencing them using the R environment variable?
I have attempted to provide an example that illustrates the problem. The example shows a working case that just "pastes" 10 IDs directly into the SQL statement. However, this option does not work when their are many more "IDs" (10,000's, 100,000's, etc.). Unfortunately I don't have permission to the Teradata DB, so I cannot join by IDs there. I only have the IDs from a .csv file, SAS dataset, etc. Also I can't just place all the IDs in the SQL statement like this example, because it is way too long with all those IDs. I have seen how others use SAS SQL statements that "reference" the IDs that are in the local SAS environment (uses * symbol), so I hope that I can do something similar in R. Thank you.
#for loading data
library(RODBC)
# Connect to Teradata SQL server
TeradataConnect <- odbcConnect(dsn="****")
#random sample of 10 IDs but their are many more (e.g., 100,000's)
IDs = c(855820852, 1461838083, 861608839, 1498698119, 817883914, 776309362,
799069685, 825438654, 856801569, 1749317792)
# specificy dates you want to pull data over
LowDate = Sys.Date() - 365
UpDate = Sys.Date()
#SQL query for pulling data
SQL_query = paste("SELECT Distinct S.ID
,S.LOC_ID
,S.DATE
,I.ITEM_DESC
FROM table.DATA S
INNER JOIN item.DATA I
ON S.ID=I.ID
WHERE 1 =1
AND I.ITEM_TYPE= 'NORMAL'
AND S.DATE BETWEEN ", paste(paste0("'", LowDate, "'")), " AND ", paste(paste0("'", UpDate, "'")), "
AND S.ID IN (", paste(paste(IDs, collapse = ", ")), ")
GROUP BY
S.ID
,S.LOC_ID
,S.DATE
,I.ITEM_DESC
")
#pulls all data from a random sample of IDs
Dataset = sqlQuery(TeradataConnect, SQL_query)
Update: Thanks @RYoda for your suggestions. I am adding an example of the SAS code so you can see how it works. I am not a SAS expert, but I received these queries from someone who is. It is my understanding that anytime they use a "&" character (e.g., "ids." where ids is a variable on the SAS environment) in the SQL statement it will use the data that is within the SAS environment. Unfortunately I don't have write access to the SQL table so I can't let the database do the work. It looks like use a workaround like you suggested, because the prefiltered data is much too large for me to load within my R environment and then perform a join using a data.table or dplyr join. I was hoping that I could find a more elegant solution, since their is a lot of the time where I am trying to use R environment variables like this to load/filter on SQL table data pulls. I think the expected final filtered data size will be several millions of rows and ~10 columns. Thanks again for your help.
PROC SQL;
CONNECT TO teradata(db=VIEWS user=&td_user. password=&td_pwd. tdpid=tableid mode=teradata);
CREATE TABLE Str_inv AS
SELECT * FROM connection to Teradata
(
SELECT
nbr1,
nbr2,
b.item_nbr,
sum(inv.qty) AS QTY
FROM
table.inventory inv
INNER JOIN
table.hier b
ON
inv.item_id=b.item_id
INNER JOIN
table.loc c
ON
inv.str_loc_id = c.loc_id
WHERE
inv.cal_dt = &fw_end_dt. AND
b.ITEM_TYP_DESC IN ('NORMAL') AND
b.CORE_ID = 1 AND
b.item_id IN (&ids.)
GROUP BY
1,2,3
);
QUIT;