Searching for the right-most node of a materialized path tree

1.1k Views Asked by At

Is it possible to sort by a materialized path tree's path text field in order to find the right-most node of the tree? For example, consider this python function that uses django-treebeard's MP_Node:

def get_rightmost_node():
    """Returns the rightmost node in the current tree.

    :rtype: MyNode
    """
    # MyNode is a subclass of django-treebeard's MP_Node.
    return MyNode.objects.order_by('-path').first()

From all my testing, it seems to return what I expect, but I don't know how to come up with the math to prove it. And I haven't found any info on performing this operation on a materialized path tree.

Treebeard's implementation doesn't have separators in paths, so the paths look like this: 0001, 00010001, 000100010012, etc.

4

There are 4 best solutions below

1
Paul Griffin On BEST ANSWER

Short answer: No.

Here is a SQLFiddle demonstrating the problem I described in my comment.

For this simple setup:

id, path
1,  '1'
2,  '1\2'
3,  '1\3'
4,  '1\4'
5,  '1\5'
6,  '1\6'
7,  '1\7'
8,  '1\8'
9,  '1\9'
10, '1\10'

attempting to get the rightmost leaf (id = 10) with a simple sort will fail:

SELECT TOP 1
  id,
  path
FROM hierarchy
ORDER BY path DESC

returns:

id, path
9,  1\9

Because path is a text-based column, 1\10 will come after 1\9 in the descending sort (See the results of the second query in the fiddle).

Even if you began tracking depth and path length, which are generally cheap and easy to keep up with, it would be entirely possible to get paths like this:

path       depth  length
12\3\11\2  4      9
5\17\10\1  4      9

which would still not sort properly.

Even if you are using letters instead of numbers, this only pushes the problem horizon out to the 26th child instead of the 10th:

SQLFiddle using letters

I am not as familiar with materialized path operations as I am with nested set and adjacency lists, and have no experience with django, so I'll defer to others if there are methods of which I am unaware, but you will almost certainly have to perform some sort of parsing on the path column to consistently get the correct leaf.

EDIT - Having addressed the question of whether sorting is a valid solution, here are some additional notes on other potential solutions after a bit of discussion and thinking on the problem a bit:

-"Rightmost" is a vague term when nodes can have more than two children (i.e., the tree is not a binary tree). If a node has 10 children, which are to the left of the parent, and which are to the right? You must define this condition before you can define a solution to the problem.

-Once "rightmost" is properly defined for your problem space, understand that the rightmost node will not necessarily be on the lowest level of the tree:

        1
       / \
    1\1   1\2 <= This is the rightmost node
    /
  1\1\1 <= This is the lowest node

-Once "rightmost" is defined, a simple loop can be used to programatically find the rightmost node:

//in pseudocode
function GetRightmostNode(Node startNode)
{
  Node currentNode = startNode;

  while(currentNode.RightChildren != null)
  {
    currentNode = maximum of currentNode.RightChildren;
  }

  return currentNode;
}

This loop will look for children of the current node to the right of the current node. If they exist, it selects the rightmost of the right children and repeats. Once it reaches a node with no children to its right, it returns the current node, as it has found the rightmost node of the tree (or subtree) with startNode as its root.

2
nik On

You can use the method @Paul explained with a bit of modifications.You can append 0 in front of every digit and can have a uniformity in length of each path.

The nodes can be assigned path as,

id |  path
-----------------
1  |  '01'
2  |  '01\01'
3  |  '01\02'
4  |  '01\03'
5  |  '01\04'
6  |  '01\04\01'
7  |  '01\04\02'
8  |  '01\04\03'
9  |  '01\05\01'
10 |  '01\05\02'
11 |  '01\05\03'
12 |  '01\05\04'

The above example can be used if the number of child nodes of the node having maximum number of child nodes is less than 100.

If it's between 100 and 1000, then you can add an extra 0 as 001\003\002\005 like wise.

Then you can get the right most node 12 as,

SELECT TOP 1 id
FROM tree
ORDER BY path DESC

You can find the demo here. Demo

0
McCroskey On

EDIT: Paul Griffin rightly pointed out that my answer was unreliable, because it assumed that the nodes would be under a certain value. Here is a better attempt, incorporating two spins on Denis de Bernardy's depth function.

Use two sort criteria, one for the depth, and then one more for the value of the leftmost node converted to an integer:

SELECT path, 
       length(regexp_replace(path, '[^/]+', '', 'g')) as depth,
       regexp_replace(path, '^.*/', '')::int as last       
FROM test 
ORDER BY depth DESC, last DESC;

This will put the deepest node with the highest value at the top.

SQLFiddle

6
Denis de Bernardy On

Is it possible to sort by a materialized path tree's path text field in order to find the right-most node of the tree?

No. If node paths are stored like '/1/3/6/2' for instance, consider:

/1
/1/3
/1/3/6/2
/1/3/6/5
/1/3/6/21
/1/40

Refer to Paul's answer for the reasons sorting the above wouldn't work.

All hope is not lost though. If you're searching for "the right-most node", by which I assume you mean the deepest nodes in a tree, you could simply count the separators. For instance:

select length(regexp_replace('/1/3/6/2', '[^/]+', '', 'g')) as depth;

If you're looking for the max of that, use something like:

order by length(regexp_replace(path, '[^/]+', '', 'g')) desc

... or the equivalent python code. Index options include indexing the same expression, or storing the result in a separate depth field and indexing that.

If you're still interested in the actual value of the ID, the numbers in the above usually correspond to the ID, so order further using that column. If they're different, extract the right-most figure using a different regular expression, and cast it to integer so as to sort them naturally (1, 11, 2) instead of lexicographically (1, 11, 2):

select regexp_replace('/1/3/6/2', '^.+/', '')::int as value;