Aggregation query in mongo and spring-data-mongo

1.1k Views Asked by At

Hi everyone I have a big problem in querying my data. I have documents like this:

{
    "_id" : NumberLong(999789748357864),
    "text" : "#asd #weila #asd2 welcome in my house",
    "date" : ISODate("2016-12-13T21:44:37.000Z"),
    "dateString" : "2016-12-13",
    "hashtags" : [ 
        "asd", 
        "weila", 
        "asd2"
    ]
}

and I want to build two queries:

1) count for each day the number of hashtag and get out for example something like this:

{_id:"2016-12-13",
hashtags:[
{hashtag:"asd",count:20},
{hashtag:"weila",count:18},
{hashtag:"asd2",count:10},
....
]
}

{_id:"2016-12-14",
hashtags:[
{hashtag:"asd",count:18},
{hashtag:"asd2",count:14},
{hashtag:"weila",count:10},
....
]
}

2)another is the same but I want to set a period from 2016-12-13 to 2016-12-17.

For the first one I write this query and I get what I search but in Spring Data Mongo I don't know how to write.

db.comment.aggregate([
{$unwind:"$hashtags"},
{"$group":{
    "_id":{ 
        "date" : "$dateString",
        "hashtag": "$hashtags"
    },
    "count":{"$sum":1}
    }
},
{"$group":{
    "_id": "$_id.date",
    "hashtags": { 
       "$push": { 
       "hashtag": "$_id.hashtag",
       "count": "$count"
     }},
     "count": { "$sum": "$count" }
}},
{"$sort": { count: -1}},
{"$unwind": "$hashtags"},
{"$sort": { "count": -1, "hashtags.count": -1}},
{"$group": {
        "_id": "$_id",
        "hashtags": { "$push": "$hashtags" },
        "count": { "$first": "$count" }
    }},
{$project:{name:1,hashtags: { $slice: ["$hashtags", 2 ]}}}
]);
1

There are 1 best solutions below

3
On BEST ANSWER

You can still use a fraction of the same aggregation operation minus the pipeline steps after the second group stage but for the filtering aspect you'd have to introduce a date range query in an initial $match pipeline step.

The following mongo shell examples show how you filter the aggregates for a particular date range:

1) Set a period from 2016-12-13 to 2016-12-14:

var startDate = new Date("2016-12-13");
startDate.setHours(0,0,0,0);

var endDate = new Date("2016-12-14");
endDate.setHours(23,59,59,999);
var pipeline = [
    { 
        "$match": {
            "date": { "$gte": startDate, "$lte": endDate }
        }
    }
    { "$unwind": "$hashtags" },
    {
        "$group": {
            "_id": {
                "date": "$dateString",
                "hashtag": "$hashtags"
            },
            "count": { "$sum": 1 }
        }
    },
    {
        "$group": {
            "_id": "$_id.date",
            "hashtags": { 
                "$push": { 
                    "hashtag": "$_id.hashtag",
                    "count": "$count"
                }
            }
        }
    }
]
db.comment.aggregate(pipeline)

2) Set a period from 2016-12-13 to 2016-12-17:

var startDate = new Date("2016-12-13");
startDate.setHours(0,0,0,0);

var endDate = new Date("2016-12-17");
endDate.setHours(23,59,59,999);
// run the same pipeline as above but with the date range query set as required

Spring Data Equivalent (untested):

import static org.springframework.data.mongodb.core.aggregation.Aggregation.*;

Aggregation agg = newAggregation(
    match(Criteria.where("date").gte(startDate).lte(endDate)),
    unwind("hashtags"),
    group("dateString", "hashtags").count().as("count"),
    group("_id.dateString")
        .push(new BasicDBObject
            ("hashtag", "$_id.hashtags").append
            ("count", "$count")
        ).as("hashtags") 
);
AggregationResults<Comment> results = mongoTemplate.aggregate(agg, Comment.class); 
List<Comment> comments = results.getMappedResults();