mongodb $group aggregation yields _id with multiple values as array; how to remove dupes from _id?

310 Views Asked by At

I am trying to conduct a very simple aggregation to collect some indexes associated with a particular owner. My query is as follows (in moped syntax):

owners = Serials.collection.aggregate([
                                            {'$group' => {
                                                '_id' => '$owners.owner.party_name',
                                                'serials' =>  { '$addToSet' =>  '$serial_number' }
                                            }}])

That's the entire function. The issue is that the 'owners.owner' field can take two forms -- it is often a nested array, with multiple party names associated with the record. But, it can also be a single record:

Form 1:

"owners": {
 "owner": [
   {
     "entry_number": "1",
     "party_name": "Company Name, LLC",
     "other_fields": "other info",
   },
   {
     "entry_number": "1",
     "party_name": "Company Name, LLC",
     "other_fields": "other info",
   }
 ]
},

(yes, often the entries are repeating within the array. Sometimes it is two or more distinct owners.)

Form 2:

"owners": {
  "owner": {
    "entry_number": "1",
    "party_name": "Another Company, Inc.",
    "other_fields": "other_info",
  }
},

Notice it is not embedded in an array in this case. Thus, I'm not sure an $unwind step in the aggregation process would work because the documents without an embedded array would return an error.

So anyways, the results of the aggregation yield records that look like this:

{"_id"=>["Random co.", "Random co."], "serials"=>["12345678"]}

but also records that look like this:

{"_id"=>["Company 1 co.", "Company 2 co."], "serials"=>["12345679", "12345778", "14562378", "87654321", "33822112", "11111111"]}

i.e. the 'party_name' fields are sometimes unique, but sometimes are two or more distinct strings.

My question is, how can I further refine this aggregation to remove duplicate strings from the '_id' field, and only preserve distinct values?

So, for example, in the first case the result would be:

 {"_id"=>["Random co."], "serials"=>["12345678"]}

While in the second case the result would be identical.

0

There are 0 best solutions below