As MySQL is driving me nuts I'm trying to make myself acquainted with my first "NoSQL" DBMS and it happened to be MongoDB. I'm connecting to it via rmongodb.
The more I play around with rmongodb, the more questions/problems come up with respect to running advanced queries.
First I present some example data before I go into detail about the different types of queries that I can't seem to specify correctly.
Example Data
The example is taken from the MongoDB website and has been simplified a bit.
pkg <- "rmongodb"
if (!require(pkg, character.only=TRUE)) {
install.packages(pkg)
require(pkg, character.only=TRUE)
}
# Connect to DB
db <- "test"
ns <- "posts"
mongo <- mongo.create(db=db)
# Insert document to collection 'test.users'
b <- mongo.bson.from.list(list(
"_id"="alex",
name=list(first="Alex", last="Benisson"),
karma=1.0,
age=30,
test=c("a", "b")
))
mongo.insert(mongo, "test.users", b)
# Insert document to collection 'test.posts'
b <- mongo.bson.from.list(list(
"_id"="abcd",
when=mongo.timestamp.create(strptime("2011-09-19 02:00:00",
"%Y-%m-%d %H:%M:%s"), increment=1),
author="alex",
title="Some title",
text="Some text.",
tags=c("tag.1", "tag.2"),
votes=5,
voters=c("jane", "joe", "spencer", "phyllis", "li"),
comments=list(
list(
who="jane",
when=mongo.timestamp.create(strptime("2011-09-19 04:00:00",
"%Y-%m-%d %H:%M:%s"), increment=1),
comment="Some comment."
),
list(
who="meghan",
when=mongo.timestamp.create(strptime("2011-09-20 13:00:00",
"%Y-%m-%d %H:%M:%s"), increment=1),
comment="Some comment."
)
)
)
)
b
mongo.insert(mongo, "test.posts", b)
Two questions related to inserting JSON/BSON objects:
- Document 'test.posts', field
voters
: is it correct to usec()
in this case? - Document 'test.posts', field
comments
: what's the right way to specify this,c()
orlist()
?
Top Level Queries: they work a treat
Top level queries work just fine:
# Get all posts by 'alex' (only titles)
res <- mongo.find(mongo, "test.posts", query=list(author="alex"),
fields=list(title=1L))
out <- NULL
while (mongo.cursor.next(res))
out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))
> out
[[1]]
_id title
"abcd" "No Free Lunch"
Question 1: Basic Sub Level Queries
How can run a simple "sub level queries" (as opposed to top level queries) that need to reach into arbitrarily deep sublevels of a JSON/BSON style MongoDB object? These sub level queries make use of MongoDB's dot notation and I can't seem to figure out how to map that to a valid rmongodb query
In plain MongoDB syntax, something like
> db.posts.find( { comments.who : "meghan" } )
would work. But I can't figure out how to do that with rmongodb functions
Here's what I tried so far
# Get all comments by 'meghan' from 'test.posts'
#--------------------
# Approach 1)
#--------------------
res <- mongo.find(mongo, "test.posts", query=list(comments=list(who="meghan")))
out <- NULL
while (mongo.cursor.next(res))
out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))
> out
NULL
# Does not work
#--------------------
# Approach 2)
#--------------------
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "comments")
mongo.bson.buffer.append(buf, "who", "meghan")
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
res <- mongo.find(mongo, "test.posts", query=query)
out <- NULL
while (mongo.cursor.next(res))
out <- c(out, list(mongo.bson.to.list(mongo.cursor.value(res))))
> out
NULL
# Does not work
Question 2: Queries Using $
Operators
These work
Query 1
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "age")
mongo.bson.buffer.append(buf, "$lte", 30)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria
> mongo.find.one(mongo, "test.users", query=criteria)
_id : 2 alex
name : 3
first : 2 Alex
last : 2 Benisson
karma : 1 1.000000
age : 1 30.000000
test : 4
0 : 2 a
1 : 2 b
Query 2
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "test")
mongo.bson.buffer.append(buf, "$in", c("a", "z"))
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria
mongo.find.one(mongo, "test.users", query=criteria)
However, notice that an atomic set will result in a return value of NULL
mongo.bson.buffer.append(buf, "$in", "a")
# Instead of 'mongo.bson.buffer.append(buf, "$in", c("a", "z"))'
Trying the same with sub level queries I'm lost again
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "name")
mongo.bson.buffer.start.object(buf, "first")
mongo.bson.buffer.append(buf, "$in", c("Alex", "Horst"))
mongo.bson.buffer.finish.object(buf)
mongo.bson.buffer.finish.object(buf)
criteria <- mongo.bson.from.buffer(buf)
criteria <- mongo.bson.from.buffer(buf)
> criteria
name : 3
first : 3
$in : 4
0 : 2 Alex
1 : 2 Horst
> mongo.find.one(mongo, "test.users", query=criteria)
NULL
Either c() or list() can be ok. Depends on whether the components are named and whether they all have the same type (for list). Best thing to do is look at the generated BSON and see if you are getting what you want. For the best control of the generated object use mongo.bson.buffer and the functions that operate on it. In fact this is why the sub-queries are failing. 'comments' is being created as a subobject rather than an array. mongo.bson.from.list() is handy but it doesn't give you the same control and sometimes it guesses wrong about what to generate from complicated structures.
The query on the other set of data can be corrected like so though:
Note that you definitely need to use a buffer here since R will choke on the dotted name.
I hope this straightens out your problem. Let me know if you have any further questions.