How does one find the lowest ancestor with a descendant leaf node with some label?

355 Views Asked by At

The problem

Given a rooted tree with n nodes, where all leaves are labelled from a set of labels. Build a datastructure which, given a leaf node, a and a label l, can find the lowest ancestor, u, of a, where u has at least one descendant with label l.

Linear Space / Linear Time approach

  • Start at leaf a
  • Examine all siblings of a
    • If a sibling has label l find the lowest-common-ancestor between a and that sibling.
    • Otherwise continue to parents
  • If all leaf-nodes descending from parents are not labelled l, continue to the grandparents and check their leaf-node descendants.
  • Continue recursively checking greater-grandparents and all their descendant leaf-nodes until an l-labelled descendant is found.

This has time complexity O(n) and space complexity O(n).


Is there a faster way to do this with linear space complexity? Perhaps by preproccessing the tree somehow? l and a are not fixed so the pre-processing has to be flexible.

The lowest common ancestor can be found in constant time using RMQ via Eulerian-Tour.

Keep in mind the tree is not balanced or sorted in any way.

2

There are 2 best solutions below

2
On BEST ANSWER

Here is a solution with O(log(n)^3) time complexity and O(n log(n)) space complexity.

Let L be the list of labels that you encounter on the Eulerian Path. You build a Segment Tree with this list, and store in each node of the tree the set of labels appearing in the corresponding segment. Then you can check in O(log(n)^2) time, if a label appears in a subtree via a range query in the segment tree.

To find the correct parent, you can do a binary search. E.g. something similar to binary lifting. Which will add another factor of log(n).

6
On

So, now I found a better solution:

The idea is the following: the further two nodes appear in the Euler Path, the higher their LCA is. I.e. index(a) < index(b) < index(c) => dist_to_root(LCA(a, b)) >= dist_to_root(LCA(a, c)).

This means that you only have to compute the LCA of a and the first node after a with the label l in the path, and the LCA of a and the last node before a with the label l in the path.

One of them will give the optimal solution to the problem.

To find these two indices efficiently, create a list of indices for each label, and perform a binary search in O(log n).

Memory complexity is O(n).