On Mongodb, I have a "conversations2" collection where I keep chat data and a "user_blocked" collection that I keep blocked by the system. While pulling the chats according to userId, I need to pull the IDs in user_blocked that do not match otherUserId. My query below works, but it is too slow for users with a high number of messages (for example, there are 450 thousand records for a user with 32 IDs). Is there a chance I can speed this up or receive this query via a different route? (My goal is to get the user's unread message count, excluding any blocking) Thanks in advance for your help. By the way, userId_1 and otherUserId_1 indexes in conversations2 are added. The id_1 index is added in user_blocked.
db.conversations2.aggregate([
{
$match: {
userId: 32
}
},
{
$lookup: {
from: "user_blocked",
localField: "otherUserId",
foreignField: "id",
as: "blockedUsers"
}
},
{
$match: {
blockedUsers: {
$eq: []
}
}
},
{
$group: {
_id: "$userId",
unreadMessageCount: {
$sum: "$unreadMessageCount"
}
}
}
])
conversations2 collection example data ;
{
"_id": {
"$oid": "65c0f64030054c4b8f0481a0"
},
"otherUserId": {
"$numberLong": "45"
},
"userId": {
"$numberLong": "32"
},
"lastMessage": "test",
"lastMessageTime": {
"$date": "2024-02-21T10:36:44.592Z"
},
"lastMessageType": 1,
"lastMessageWay": "in",
"unreadMessageCount": 29
}
user_blocked example data;
{
"_id": {
"$oid": "66033f989bba279fe7d0862a"
},
"id": {
"$numberLong": "45"
}
}
1. Only check for
unread > 0Since you're getting only
unreadMessageCountin this query, the first minor optimisation is to add that to your first$matchstage. Since the total is unchanged when it's0, whether the user is blocked or not.2. Uncorrelated Subquery with
$lookup2A.
We can also try to optimise the number of
$lookups which actually occur. By using an Uncorrelated Subquery with$lookup- ie it only runs once and not for every document.Mongo Playground with the correct result of unread
10However, if your
user_blockedcollection is too big, you may hit the 16MB per-stage limit (1.3 mil ids for Ints, ~600k for Longs), followed by the 100MB limit withallowDiskUse: true. In that case use variation B below:2B.
This aggregation pipeline will reduce the total number of lookups needed and has less chance of running into stage-size limits but it still might. Steps:
otherIDsinto a single arrayuser_blockedvalidOtherIDsconversations2but only with thevalidOtherIDsMongo Playground
3. ‼ No lookups, Add a field
otherUserBlockedThis optimisation will require a data/structure change but is the most scalable and most performant:
Add a field like
otherUserBlocked: true/falseand index it. Initially default it tofalse(no users blocked) and then set it totrueusing a similar pipeline to the one you already have.If you skip initial defaults, you'll need to have clauses
{ otherUserBlocked: { $exists: true } }added to the queries below.Each time a user gets blocked, you're already adding them to
user_blockedcollection. Add another step to also updateconversations2with{ $match: { otherUserId: blocked_user_id } }and setotherUserBlocked: true. Something like:And if they get unblocked, set it
false.Your aggregation pipeline can then use this in the first
$matchstage and eliminates the need for the$lookupcompletely, as well as the second$matchstage. The pipeline becomes:Mongo Playground with the new field
Both of these changes will also be useful when you want to actually show the unread messages, and not just the count.
More about MongoDB Schema Design best practices.