Looping with dynamic variables in R

234 Views Asked by At

I have built an API query for a service and I would like to create a loop that iterates through dates building up multiple end data frames. The code I have, so far, looks like this:

query1 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"

x1 <- dsApiRequest(token = token, query = query)
m1 <- dsApi2df(D)

What I want to do is increase the dates, 2 months by 2 months building up from query1, x1 and m1 to queryn, xn and mn. Written in full, for the first 2 passes, it would look like this:

query1 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"
return publications[type + all]"

Y1 <- dsApiRequest(token = token, query = query)
N1 <- dsApi2df(D)

THEN

query2 <- "search publications in full_data for \"\\\"Education\\\"\" 
where type in [ \"article\" ] 
and (category_for.name ~\"Education\") 
and date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"
return publications[type + all]"

Y2 <- dsApiRequest(token = token, query = query)
N2 <- dsApi2df(D)

Note the date must also change with each pass.

1

There are 1 best solutions below

0
On

I like the base sprintf command for things like this, though the glue package is new and has a nice interface. With sprintf you put %s as a placeholder inside a string, and then you can use additional arguments to replace with values.

I've "simplified" your query to focus on the changing dates.

query  <- "blah blah
and date_inserted >= \"%s\" and date_inserted < \"%s\"
return blah blah"

library(lubridate)
start_dates = seq(as.Date("2019-01-01"), as.Date("2020-09-01"), by = "2 months")
end_dates = start_dates + months(1) # lubridate is only used here for this nice months() function

query_vec = sprintf(query, format(start_dates, "%Y-%m"), format(end_dates, "%Y-%m"))
query_vec
# [1] "blah blah\nand date_inserted >= \"2019-01\" and date_inserted < \"2019-02\"\nreturn blah blah"
# [2] "blah blah\nand date_inserted >= \"2019-03\" and date_inserted < \"2019-04\"\nreturn blah blah"
# [3] "blah blah\nand date_inserted >= \"2019-05\" and date_inserted < \"2019-06\"\nreturn blah blah"
# ...

With glue, you can put variable names inside {braces} in your string, and they will automatically be filled in when you glue() it. (Somewhat confusingly, the result prints without quotes, but it is still a character vector and will still work just fine.) (Using the same start_dates and end_dates as above.)

library(glue)
glue_query = "blah blah
and date_inserted >= \"{start_dates}\" and date_inserted < \"{end_dates}\"
return blah blah"

query_vec = glue(glue_query)
query_vec
# blah blah
# and date_inserted >= "2019-01-01" and date_inserted < "2019-02-01"
# return blah blah
# blah blah
# and date_inserted >= "2019-03-01" and date_inserted < "2019-04-01"
# return blah blah
# ...