Why aggregate two collections with $out never ends ? MongoDB

121 Views Asked by At

OK, so, I have two collections. One of them has 6M documents, the other one has 25M documents. I want them in a new collection:

Collection 1 example:

Movie

 {"movieId" : "1", "name" : "Titanic"},
 {"movieId" : "2", "name" : "King Kong"}

Collection 2 example:

Character

{"characterId": "1", "movieId": "1", "characterName": "Name 1"},
{"characterId": "2", "movieId": "1", "characterName": "Name 2"},
{"characterId": "3", "movieId": "1", "characterName": "Name 3"}

So I want a new collection like:

{
  "movieId" : "1", 
  "name" : "Titanic", 
  "characters":[ *collection 2 here* ]
},
{
  "movieId" : "2", 
  "name" : "King Kong", 
  "characters":[]
}

I tried:

db.Movie.aggregate([{ $lookup: { from: "Character",localField: "movieId", foreignField: "movieId", as: "characters" }},{ $out : "movie_characters" }])

But it never ends :( (by never I mean like 10 hours later it was still thinking) If I execute it without the $out it show results in 10 minutes maybe.

Am I doing something wrong with $out?

Thanks for any advice.

2

There are 2 best solutions below

0
Vijay Rajpurohit On BEST ANSWER

The major thing you leaving here is an index.

$lookup when matching the foreign field, MongoDB uses the index.

So, create an index in the Character collection using:

db.Character.createIndex({ "movieId": 1 })

And then apply the lookup aggregation

db.Movie.aggregate([
  {
    $lookup: {
      from: "Character",
      localField: "movieId", 
      foreignField: "movieId", 
      as: "characters" 
    }
  },
  { 
    $out : "movie_characters" 
  }
])

Hope this will help :)

1
AlexZeDim On

The problem not just in a single index but in performance also. I highly recommend you to use Cursors for heavy queries like yours. (Check this answer for more info)

When you are using this query you also "flooding" your RAM with all the documents. So using cursor will definitely helps you.

Also, try to use Mongo Compass for testing your queries before launch it on production via Explain mode, which gives your all what youneed to know, especially how long your query takes to execute, what indexes it using, and much more.