R: Convert Rmongo output to dataframe

329 Views Asked by At

I have been using the package Rmongo to pull data from a MongoDB.

library(Rmongo)

mongo <- mongoDbConnect("cmdbData", host="XX-MONGODB-02", port=27017)
data_users <- dbGetQuery(mongo, 'computers', '{}')

The pulled data looks like this:

             update_bol      key      cData
1            delete          NA       "{ \"name\" : \"name1\" , \"domain\" : \"xx.yy.dk\"}"
2            update          NA       "{ \"name\" : \"name2\" , \"domain\" : \"xx.yy.dk\"}"
3            update          NA       "{ \"name\" : \"name3\" , \"domain\" : \"xx.yy.dk\"}"
4            update          NA       "{ \"name\" : \"null\" , \"domain\" : \"xx.yy.dk\"}"
5            update          NA       "{ \"name\" : \"name5\" , \"domain\" : \"zz.yy.dk\"}"
6            delete          NA       "{ \"name\" : \"name6\" , \"domain\" : \"zz.yy.dk\"}"

The dataframe can be manually recreated using:

data.frame(update_bol = c("delete", "update", "update", "update", "update", "delete"),
       key = c(NA, NA, NA, NA, NA, NA), 
       cData = c("{ \"name\" : \"name1\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name2\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name3\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"null\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name5\" , \"domain\" : \"zz.yy.dk\"}", "{ \"name\" : \"name6\" , \"domain\" : \"zz.yy.dk\"}"), stringsAsFactors = FALSE)

I would like this output:

             name       domain
1            name1      xx.yy.dk
2            name2      xx.yy.dk
3            name3      xx.yy.dk
4            NULL       xx.yy.dk
5            name5      zz.yy.dk
6            name6      zz.yy.dk

I have attempted to make more advanced queries to output the cData column directly from the dbGetQuery-function, but I am new to MongoDB, and cannot seem to find an appropriate query.

The format of cData looks like JSON, so I have also tried to use the "jsonlite" package to extract the columns, but I have not been able to make that work either. Do you have any suggestions?

library(jsonlite)
library(tidyverse)

fromJSON(data_users$ciData[1]) %>% as.data.frame

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
arguments imply differing number of rows: 1, 0
2

There are 2 best solutions below

0
On

This might not be the best approach but it will get you going. In the approach I am transforming the data that you have to the desired format.

df<-data.frame(update_bol = c("delete", "update", "update", "update", "update", "delete"),
   key = c(NA, NA, NA, NA, NA, NA), 
   cData = c("{ \"name\" : \"name1\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name2\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name3\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"null\" , \"domain\" : \"xx.yy.dk\"}", "{ \"name\" : \"name5\" , \"domain\" : \"zz.yy.dk\"}", "{ \"name\" : \"name6\" , \"domain\" : \"zz.yy.dk\"}"), stringsAsFactors = FALSE)

clean<-function(x){
cleand_x<-gsub(pattern = '[\\{\\}\\"]',replacement = "",x = df$cData,fixed=F)
cleand_x<-strsplit(cleand_x,split = " ")
final<-sapply(cleand_x,function(t)
{
    c(name=t[[4]],domain=t[[8]])
},simplify = T)
return(as.data.frame(t(final)))
}

clean(df)

Output

name   domain
1 name1 xx.yy.dk
2 name2 xx.yy.dk
3 name3 xx.yy.dk
4 name4 xx.yy.dk 
5 name5 zz.yy.dk
6 name6 zz.yy.dk
0
On

I would recommend the mongolite package over RMongo. You can use mongo queries to pull data in from your mongodb, and into a dataframe. If you are querying data that contains arrays, you will need to use $unwind and possibly "flatten" (flatten is from jsonlite package).