I am trying to perform a query using golang mgo to effectively get distinct values from a join, I understand that this might not be the best paradigm to work with in Mongo.
Something like this:
pipe := []bson.M{
{
"$group": bson.M{
"_id": bson.M{"user": "$user"},
},
},
{
"$match": bson.M{
"_id": bson.M{"$exists": 1},
"user": bson.M{"$exists": 1},
"date_updated": bson.M{
"$gt": durationDays,
},
},
},
{
"$lookup": bson.M{
"from": "users",
"localField": "user",
"foreignField": "_id",
"as": "user_details",
},
},
{
"$lookup": bson.M{
"from": "organizations",
"localField": "organization",
"foreignField": "_id",
"as": "organization_details",
},
},
}
err := d.Pipe(pipe).All(&result)
If I comment out the $group
section, the query returns the join as expected.
If I run as is, I get NULL
If I move the $group
to the bottom of the pipe I get an array response with Null values
Is it possible to do do an aggregation with a $group
(with the goal of simulating DISTINCT
) ?
The reason you're getting NULL is because your
$match
filter is filtering out all of documents after the$group
phase.After your first stage of
$group
the documents are only as below example:They no longer contains the other fields i.e.
user
,date_updated
andorganization
. If you would like to keep their values, you can utilise Group Accumulator Operator. Depending on your use case you may also benefit from using Aggregation Expression VariablesAs an example using mongo shell, let's use $first operator which basically pick the first occurrence. This may make sense for
organization
but not fordate_updated
. Please choose a more appropriate accumulator operator.Note that the above also replaces
{"_id":{"user":"$user"}}
with simpler{"_id":"$user"}
.Next we'll add $project stage to rename our result of
_id
field from the group operation back touser
. Also carry along the other fields without modifications.Your $match stage can be simplified, by just listing the
date_updated
filter. First we can remove_id
as it's no longer relevant up to this point in the pipeline, and also if you would like to make sure that you only process documents withuser
value you should placed$match
before the$group
. See Aggregation Pipeline Optimization for more.So, all of those combined will look something as below:
(I know you're aware of it) Lastly, based on the database schema above with
users
andorganizations
collections, depending on your application use case you may re-consider embedding some values. You may find 6 Rules of Thumb for MongoDB Schema Design useful.