MongoDB: find documents with a given array of subdocuments

800 Views Asked by At

I want to find documents which contain given subdocuments, let's say I have the following documents in my commits collection:

// Document 1
{ 
  "commit": 1,
  "authors" : [
    {"name" : "Joe", "lastname" : "Doe"},
    {"name" : "Joe", "lastname" : "Doe"}
  ] 
}

// Document 2
{ 
  "commit": 2,
  "authors" : [
    {"name" : "Joe", "lastname" : "Doe"},
    {"name" : "John", "lastname" : "Smith"}
  ] 
}

// Document 3
{ 
  "commit": 3,
  "authors" : [
    {"name" : "Joe", "lastname" : "Doe"}
  ] 
}

All I want from the above collection is 1st document, since I know I'm looking for a commit with 2 authors were both have same name and lastname. So I came up with the query: db.commits.find({ $and: [{'authors': {$elemMatch: {'name': 'Joe, 'lastname': 'Doe'}}, {'authors': {$elemMatch: {'name': 'Joe, 'lastname': 'Doe'}}], 'authors': { $size: 2 } })

$size is used to filter out 3rd document, but the query still returns 2nd document since both $elemMatch return True.

I can't use index on subdocuments, since the order of authors used for search is random. Is there a way to remove 2nd document from results without using Mongo's aggregate function?

1

There are 1 best solutions below

1
On BEST ANSWER

What you are asking for here is a little different from a standard query. In fact you are asking for where the "name" and "lastname" is found in that combination in your array two times or more to identify that document.

Standard query arguments do not match "how many times" an array element is matched within a result. But of course you can ask the server to "count" that for you using the aggregation framework:

db.collection.aggregate([
    // Match possible documents to reduce the pipeline
    { "$match": {
        "authors": { "$elemMatch": { "name": "Joe", "lastname": "Doe" } }
    }},

    // Unwind the array elements for processing
    { "$unwind": "$authors" },

    // Group back and "count" the matching elements
    { "$group": {
        "_id": "$_id",
        "commit": { "$first": "$commit" },
        "authors": { "$push": "$authors" },
        "count": { "$sum": {
            "$cond": [
                { "$and": [
                    { "$eq": [ "$authors.name", "Joe" ] },
                    { "$eq": [ "$authors.lastname", "Doe" ] }
                ]},
                1,
                0
            ]
        }}
    }},

    // Filter out anything that didn't match at least twice
    { "$match": { "count": { "$gte": 2 } } }
])

So essentially you but your conditions to match inside the $cond operator which returns 1 where matched and 0 where not, and this is passed to $sum to get a total for the document.

Then filter out any documents that did not match 2 or more times