Batch executing query in RPostgres returning same data

44 Views Asked by mhohaus At 25 June 2025 at 07:59

I am currently trying to download a large database using RPostgres - however, when I try to write the results to csvs in batches, it writes the same data over and over (always the first 1M records). Here is the code I am currently using:

wrds <- dbConnect(Postgres(),
                  host='wrds-pgdata.wharton.upenn.edu',
                  port=9737,
                  dbname='wrds',
                  sslmode='require',
                  user='username')

res <- dbSendQuery(wrds, "SELECT VARNAMES FROM DATABASE WHERE CONDITIONS")
i = 0

while (!dbHasCompleted(res)) {
  i <- i+1
  chunk <- dbFetch(res, 1000000) # Fetch 1,000,000 at a time
  filename <- paste("path\\EuropeanSMEs_", i)
  write.csv(data, filename, row.names = FALSE)
}

dbClearResult(res)
dbDisconnect(wrds)

How do I amend the code that it fetches a new set of 1M rows on every iteration?

Original Q&A

There are 1 best solutions below

Ian Gow On 15 May 2023 at 04:04

I don't think there is a simple way to do this. Using PROC EXPORT from SAS would work (you would then need to download the .csv file from WRDS). If I really needed to have a CSV file of a large WRDS data set, I might adapt my function here to write to a local CSV file rather than a PostgreSQL database.

Another option might be to look for a variable that roughly partitions the data into one million rows. For example, year or similar and use the WHERE part of the SQL to get rows in chunks. Using dbplyr (see examples here, you could probably write a small function to collect() data by year or some other partitioning variable and write to a CSV.

Batch executing query in RPostgres returning same data

There are 1 best solutions below

Related Questions in R

Related Questions in RPOSTGRESQL

Related Questions in WRDS-COMPUSAT

Related Questions in DBCONNECT

Related Questions in RPOSTGRES

Trending Questions

Popular # Hahtags

Popular Questions