How to remove duplicate values inside a list in mongodb

Question

How to remove duplicate values inside a list in mongodb

10.1k Views Asked by station At 29 July 2025 at 03:40

I have a mongodb collection . When I do.

db.bill.find({})

I get,

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  190215,  147708 ],
    "customer_name" : "abc"
}

I need an operation to remove the duplicate values from the bill_codes. Finally it should be

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  147708 ],
    "customer_name" : "abc"
}

How to achieve this in mongodb.

Original Q&A

There are 4 best solutions below

anhlc On 18 June 2015 at 01:49

You can use a foreach loop with some javascript:

db.bill.find().forEach(function(entry){
     var arr = entry.bill_codes;
     var uniqueArray = arr.filter(function(elem, pos) {
        return arr.indexOf(elem) == pos;
     }); 
     entry.bill_codes = uniqueArray;
     db.bill.save(entry);
})

Dennis Golomazov On 24 September 2018 at 17:58

Mongo 3.4+ has $addFields aggregation stage, which allows you to avoid explicitly listing all the other fields in $project:

db.bill.aggregate([
    {"$addFields": {
        "bill_codes": {"$setUnion": ["$bill_codes", []]}
    }}
])

Just for reference, here is another (more lengthy) way that uses replaceRoot and also doesn't require listing all possible fields:

db.bill.aggregate([
    {'$unwind': {
        'path': '$bill_codes',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'bill_codes': {'$addToSet': '$bill_codes'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    {'$project': {'_other_fields': 0}}
])

prasad_ On 06 September 2019 at 10:45

MongoDB 4.2 collection updateMany method's update parameter can also be an aggregation pipeline (instead of a document). The pipeline supports $set, $unset and $replaceWith stages. Using the $setIntersection aggregation pipeline operator with the $set stage, you can remove the duplicates from an array field and update the collection in a single operation.

An example:

arrays collection:

{ "_id" : 0, "a" : [ 3, 5, 5, 3 ] }
{ "_id" : 1, "a" : [ 1, 2, 3, 2, 4 ] }

From the mongo shell:

db.arrays.updateMany(
   {  },
   [
      { $set: { a: { $setIntersection: [ "$a", "$a" ] } } }
   ]
)

The updated arrays collection:

{ "_id" : 0, "a" : [ 3, 5 ] }
{ "_id" : 1, "a" : [ 1, 2, 3, 4 ] }

The other update methods, update(), updateOne() and findAndModify() also has this feature.

**AudioBubble** · Accepted Answer

Well's you can do this using the aggregation framework as follows:

collection.aggregate([
    { "$project": {
        "name": 1,
        "code": 1,
        "abbreviation": 1,
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] }
    }}
])

The $setUnion operator is a "set" operator, therefore to make a "set" then only the "unique" items are kept behind.

If you are still using a MongoDB version older than 2.6 then you would have to do this operation with $unwind and $addToSet instead:

collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "code": { "$first": "$code" },
        "abbreviation": { "$first": "$abbreviation" },
        "bill_codes": { "$addToSet": "$bill_codes" }
    }}
])

It's not as efficient but the operators are supported since version 2.2.

Of course if you actually want to modify your collection documents permanently then you can expand on this and process the updates for each document accordingly. You can retrieve a "cursor" from .aggregate(), but basically following this shell example:

db.collection.aggregate([
    { "$project": {
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] },
        "same": { "$eq": [
            { "$size": "$bill_codes" },
            { "$size": { "$setUnion": [ "$bill_codes", [] ] } }
        ]}
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

A bit more involved for earlier versions:

db.collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": { 
            "_id": "$_id",
            "bill_code": "$bill_codes"
        },
        "origSize": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id._id",
        "bill_codes": { "$push": "$_id.bill_code" },
        "origSize": { "$sum": "$origSize" },
        "newSize": { "$sum": 1 }
    }},
    { "$project": {
        "bill_codes": 1,
        "same": { "$eq": [ "$origSize", "$newSize" ] }
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

With the added operations in there to compare if the "de-duplicated" array is the same as the original array length, and only return those documents that had "duplicates" removed for processing on updates.

Probably should add the "for python" note here as well. If you don't care about "identifying" the documents that contain duplicate array entries and are prepared to "blast" the whole collection with updates, then just use python .set() in the client code to remove the duplicates:

for doc in collection.find():
    collection.update(
       { "_id": doc["_id"] },
       { "$set": { "bill_codes": list(set(doc["bill_codes"])) } }
    )

So that's quite simple and it depends on which is the greater evil, the cost of finding the documents with duplicates or updating every document whether it needs it or not.

This at least covers techniques.

How to remove duplicate values inside a list in mongodb

There are 4 best solutions below

Related Questions in MONGODB

Related Questions in MONGODB-QUERY

Related Questions in PYMONGO

Related Questions in AGGREGATION-FRAMEWORK

Trending Questions

Popular # Hahtags

Popular Questions