Speed up own GraphLookup implementation over two collections

51 Views Asked by At

I store millions of nodes in two collections. First, larger one, is trustedNodes and other one is pendingNodes. Each node has UUID _id and UUID parentNodeId fields alongside with text content field. I need to build a rootPath - ordered list of documents from a leaf node to a root one using both trusted and pending nodes preferring trusted over pending ones if its possible (sometimes middle nodes are missing and full path can not be built). (average path depth is like 3; with longest one ~100)

i.e. building path on node with _id = 5 in these two collections:

"trustedNodes": [
{
  "_id": "1",
  "content": "someContent1",
  "parentNodeId": null
},
{
  "_id": "2",
  "content": "someContent2",
  "parentNodeId": "1"
},
{
  "_id": "3",
  "content": "someContent3",
  "parentNodeId": "2"
},
{
  "_id": "5",
  "content": "someContent5",
  "parentNodeId": "4"
},
{
  "_id": "7",
  "content": "someContent7",
  "parentNodeId": "2"
}
]

"pendingNodes": [
{
  "_id": "2",
  "content": "other someContent2",
  "parentNodeId": "1"
},
{
  "_id": "4",
  "content": "pending someContent4",
  "parentNodeId": "3"
}
]

Should return following document

"_id": "5",
"content": "someContent5",
"pathIsComplete": true,
"rootPath": [
    {
      "_id": "1",
      "content": "someContent1",
      "parentNodeId": null
    },
    {
      "_id": "2",
      "content": "someContent2",
      "parentNodeId": "1"
    },
    {
      "_id": "3",
      "content": "someContent3",
      "parentNodeId": "2"
    },  
    {
      "_id": "4",
      "content": "pending someContent4",
      "parentNodeId": "3"
    }
]

For now it's done in the code by doing up to 2*N requests to the db where N is resulting path length. Some pseudocode:

  1. db.trustedNodes.find( { _id: 5 } )
  2. if none found at step 1 then db.pendingNodes.find( { _id: 5 } )
  3. set foundNode from find result, if not null - append to result array
  4. db.trustedNodes.find({ _id: $foundNode.parentNodeId })
  5. if none found at step 4 then db.pendingNodes.find({ _id: $foundNode.parentNodeId })
  6. repeat steps 3-5 until foundNode or foundNode.parentNodeId is null
  7. return resulting array with pathIsComplete true if foundNode.parentNodeId is null

Can this be done faster by using aggregation over indexes or by fewer requests?

Before I tried creating a view with $unionWith -> $set -> $group view pipeline then doing $graphLookup over it. But it was slow because views can not have indexes. Having on demand materialized view with index didn't work also because I need strong consistency and updating this materialized view each time I need to do a graphLookup is also slow.

For now I see only one optimisation that can be made: storing previousy computed trusted paths(trusted paths can not change), but first I wanna explore if there are no more elegant solutions

0

There are 0 best solutions below