Finding document containing array of nested names in pymongo (CrossRef data)

38 Views Asked by At

I have a dataset of CrossRef works records stored in a collection called works in MongoDB and I am using a Python application to query this database.

I am trying to find documents based on one author's name. Removing extraneous details, a document might look like this:

{'DOI':'some-doi',
'author':[{'given': 'Albert','family':'Einstein',affiliation:[]},
{'given':'R.','family':'Feynman',affiliation:[]},
{'given':'Paul','family':'Dirac',affiliation:['University of Florida']}]
}

It isn't clear to me how to combine the queries to get just Albert Einstein's papers.

I have indexes on author.family and author.given, I've tried:

cur = works.find({'author.family':'Einstein','author.given':'Albert'})

This returns all of the documents by people called 'Albert' and all of those by people called 'Einstein'. I can filter this manually, but it's obviously less than ideal.

I also tried:

cur = works.find({'author':{'given':'Albert','family':'Einstein','affiliation':[]}})

But this returns nothing (after a very long delay). I've tried this with and without 'affiliation'. There are a few questions on SO about querying nested fields, but none seem to concern the case where we're looking for 2 specific things in 1 nested field.

1

There are 1 best solutions below

3
On BEST ANSWER

Your issue is that author is a list.

You can use an aggregate query to unwind this list to objects, and then your query would work:

cur = works.aggregate([{'$unwind': '$author'},
                       {'$match': {'author.family':'Einstein', 'author.given':'Albert'}}])

Alternatively, use $elemMatch which matches on arrays that match all the elements specified.

cur = works.find({"author": {'$elemMatch': {'family': 'Einstein', 'given': 'Albert'}}})

Also consider using multikey indexes.