I have the following function written in R that (I think) is doing a poor job of updating my mongo databases collections.
library(mongolite)
con <- mongolite::mongo(collection = "mongo_collection_1", db = 'mydb', url = 'myurl')
myRdataframe1 <- con$find(query = '{}', fields = '{}')
rm(con)
con <- mongolite::mongo(collection = "mongo_collection_2", db = 'mydb', url = 'myurl')
myRdataframe2 <- con$find(query = '{}', fields = '{}')
rm(con)
... code to update my dataframes (rbind additional rows onto each of them) ...
# write dataframes to database
write.dfs.to.mongodb.collections <- function() {
collections <- c("mongo_collection_1", "mongo_collection_2")
my.dataframes <- c("myRdataframe1", "myRdataframe2")
# loop dataframes, write colllections
for(i in 1:length(collections)) {
# connect and add data to this table
con <- mongo(collection = collections[i], db = 'mydb', url = 'myurl')
con$remove('{}')
con$insert(get(my.dataframes[i]))
con$count()
rm(con)
}
}
write.dfs.to.mongodb.collections()
My dataframes myRdataframe1
and myRdataframe2
are very large dataframes, currently ~100K rows and ~50 columns. Each time my script runs, it:
- uses con$find('{}') to pull the mongodb collection into R, saved as a dataframe
myRdataframe1
- scrapes new data from a data provider that gets appended as new rows to
myRdataframe1
- uses con$remove() and con$insert to fully remove the data in the mongodb collection, and then re-insert the entire
myRdataframe1
This last bullet point is iffy, because I run this R script daily in a cronjob and I don't like that each time I am entirely wiping the mongo db collection and re-inserting the R dataframe to the collection.
If I remove the con$remove() line, I receive an error that states I have duplicate _id keys. It appears I cannot simply append using con$insert().
Any thoughts on this are greatly appreciated!
When you attempt to insert documents into MongoDB that already exist in the database as per their primary key you will get the duplicate key exception. In order to work around that you can simply unset the
_id
column using something like this before thecon$insert
:This way, the newly inserted document will automatically get a new
_id
assigned.