I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):
create table ancestry (
ancestor integer,
descendant integer,
distance integer
);
I have another table that contains the organizations that each user is allowed to access:
create table accessible (
user integer,
organization integer
);
The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).
For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.
I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.
The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.
Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:
The main query adds the distance:
The main query then orders the results by descending count and distance
The LCA is the first item in the list.