Extract values from an array in mongoDB to dataframe using rmongodb

959 Views Asked by At

I'm querying a database containing entries as displayed in the example. All entries contain the following values:

  • _id: unique id of overallitem and placed_items
  • name: the name of te overallitem
  • loc: location of the overallitem and placed_items
  • time_id: time the overallitem was stored
  • placed_items: array containing placed_items (can range from zero: placed_items : [], to unlimited amount.
  • category_id: the category of the placed_items
  • full_id: the full id of the placed_items

I want to extract the name, full_id and category_id on a per placed_items level given a time_id and loc constraint

Example data:

{
 "_id" : "5040",
 "name" : "entry1",
 "loc" : 1,
 "time_id" : 20121001,
 "placed_items" : [],
}
{
 "_id" : "5041",
 "name" : "entry2",
 "loc" : 1,
 "time_id" : 20121001,
 "placed_items" : [
  {
   "_id" : "5043",
   "category_id" : 101,
   "full_id" : 901,
  },
  {
   "_id" : "5044",
   "category_id" : 102,
   "full_id" : 902,
  }
 ],
}
{
 "_id" : "5042",
 "name" : "entry3",
 "loc" : 1,
 "time_id" : 20121001,
 "placed_items" : [
  {
   "_id" : "5045",
   "category_id" : 101,
   "full_id" : 903,
  },
 ],
}

The expected outcome for this example would be:

"name"    "full_id" "category_id"
"entry2"    901         101
"entry2"    902         102
"entry3"    903         101

So if placed_items is empty, do put the entry in the dataframe and if placed_items containts n entries, put n entries in dataframe

I tried to work out an RBlogger example to create the desired dataframe.

#Set up database
    mongo <- mongo.create()

    #Set up condition
    buf <- mongo.bson.buffer.create()
    mongo.bson.buffer.append(buf, "loc", 1)
    mongo.bson.buffer.start.object(buf, "time_id")
    mongo.bson.buffer.append(buf, "$gte", 20120930)
    mongo.bson.buffer.append(buf, "$lte", 20121002)
    mongo.bson.buffer.finish.object(buf)
    query <- mongo.bson.from.buffer(buf)

    #Count  
    count <- mongo.count(mongo, "items_test.overallitem", query) 

#Note that these counts don't work, since the count should be based on 
#the number of placed_items in the array, and not the number of entries. 

    #Setup Cursor
    cursor <- mongo.find(mongo, "items_test.overallitem", query)
    #Create vectors, which will be filled by the while loop
    name <- vector("character", count)
    full_id<- vector("character", count)
    category_id<- vector("character", count) 

    i <- 1
    #Fill vectors
    while (mongo.cursor.next(cursor)) {
        b <- mongo.cursor.value(cursor)
        order_id[i] <- mongo.bson.value(b, "name")
        product_id[i] <- mongo.bson.value(b, "placed_items.full_id")
        category_id[i] <- mongo.bson.value(b, "placed_items.category_id")
        i <- i + 1
    }
    #Convert to dataframe
    results <- as.data.frame(list(name=name, full_id=full_uid, category_id=category_id))

The conditions work and the code works if I would want to extract values on an overallitem level (i.e. _id or name) but fails to gather the information on a placed_items level. Furthermore, the dotted call for extracting full_id and category_id does not seem to work. Can anyone help?

0

There are 0 best solutions below