Batch executing query in RPostgres returning same data

36 Views Asked by At

I am currently trying to download a large database using RPostgres - however, when I try to write the results to csvs in batches, it writes the same data over and over (always the first 1M records). Here is the code I am currently using:

wrds <- dbConnect(Postgres(),
                  host='wrds-pgdata.wharton.upenn.edu',
                  port=9737,
                  dbname='wrds',
                  sslmode='require',
                  user='username')

res <- dbSendQuery(wrds, "SELECT VARNAMES FROM DATABASE WHERE CONDITIONS")
i = 0

while (!dbHasCompleted(res)) {
  i <- i+1
  chunk <- dbFetch(res, 1000000) # Fetch 1,000,000 at a time
  filename <- paste("path\\EuropeanSMEs_", i)
  write.csv(data, filename, row.names = FALSE)
}

dbClearResult(res)
dbDisconnect(wrds)

How do I amend the code that it fetches a new set of 1M rows on every iteration?

1

There are 1 best solutions below

0
On

I don't think there is a simple way to do this. Using PROC EXPORT from SAS would work (you would then need to download the .csv file from WRDS). If I really needed to have a CSV file of a large WRDS data set, I might adapt my function here to write to a local CSV file rather than a PostgreSQL database.

Another option might be to look for a variable that roughly partitions the data into one million rows. For example, year or similar and use the WHERE part of the SQL to get rows in chunks. Using dbplyr (see examples here, you could probably write a small function to collect() data by year or some other partitioning variable and write to a CSV.