I need to construct a query, using PyMongo, which gets data from two related collections in a MongoDB database.
Collection X has fields UserId, Name, and EmailId:
[
{
"UserId" : "941AB",
"Name" : "Alex Andresson",
"EmailId" : "[email protected]"
},
{
"UserId" : "768CD",
"Name" : "Bryan Barnes",
"EmailId" : "[email protected]"
}
]
Collection Y has fields UserId1, UserID2, and Rating:
[
{
"UserId1" : "941AB",
"UserId2" : "768CD",
"Rating" : 0.8
}
]
I need to print the name and email id of UserId1 and UserId2 and the rating, something like this:
[
{
"UserId1" : "941AB",
"UserName1" : "Alex Andresson"
"UserEmail1" : "[email protected]",
"UserId2" : "768CD",
"UserName2" : "Bryan Barnes"
"UserEmail2" : "[email protected]",
"Rating": 0.8
}
]
That means I need to fetch data from collection Y as well as the X one. I'm working with PyMongo right now and I have not been able to find its solution. Can somebody even give me a pseudocode on this concept or approach how to move forward with it.
You need to do the join manually or use some library that will do it for you - maybe mongoengine.
Basically you need to find the ratings you are interested in and then find the users that are related to those ratings.
Example:
Notice that the first approach calls one
findfor ratings and twofinds per rating, but the second approach calls just threefinds in total. This would cause a huge performance difference if you are accessing MongoDB over the network.I recommend to use
_idinstead ofUserIdif possible for the users collection.Of course this particular use case would be much easier with SQL database. If you are using MongoDB for performance and you have much more reads than writes then consider caching related users Name into the rating document.