Postgres LTREE: How to do recursive for only a few specific rows?

265 Views Asked by At

My actual use case is similar to how comment threads work on Reddit for example. If I do the query based on the LTREE path for the post itself, then it works fine and brings all the child comments. But if I do the query on permalink of a nested child comment, then I want it to return both the parent post and also all the children from that child comment onwards.

My actual query is fairly complex but I am able to demonstrate the issue with this simple example.

I have the following table:

CREATE TABLE nodes (node_path ltree,score INTEGER);
INSERT INTO nodes VALUES ('Top',150);
INSERT INTO nodes VALUES ('Top.Science',23);
INSERT INTO nodes VALUES ('Top.Science.Astronomy',5);
INSERT INTO nodes VALUES ('Top.Science.Astronomy.Astrophysics',43);
INSERT INTO nodes VALUES ('Top.Science.Astronomy.Cosmology',23);
INSERT INTO nodes VALUES ('Top.Hobbies',16);
INSERT INTO nodes VALUES ('Top.Hobbies.Amateurs_Astronomy',2);
INSERT INTO nodes VALUES ('Top.Collections',65);
INSERT INTO nodes VALUES ('Top.Collections.Pictures',88);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy',36);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Stars',97);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Galaxies',58);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Astronauts',6);
INSERT INTO nodes VALUES ('Top.Dislikes',28);
CREATE INDEX ON nodes (score);
CREATE INDEX ON nodes USING GIST (node_path);
CREATE INDEX ON nodes USING BTREE (node_path);

I need to query the data such that Top.Science and Top.Collections nodes are present plus I also want all the nodes which are or are descendants of the path *.Top.Collections.Pictures.Astronomy.*. I basically want it to not do recursion for the top 2 rows and do it for the children ones.

The correct output I want is:

                   node_path                   | score | sort_path 
-----------------------------------------------+-------+-----------
 Top.Science                                   |    23 | {1}
 Top.Collections                               |    65 | {2}
 Top.Collections.Pictures.Astronomy            |    36 | {2,2}
 Top.Collections.Pictures.Astronomy.Astronauts |     6 | {2,2,1}
 Top.Collections.Pictures.Astronomy.Galaxies   |    58 | {2,2,2}
 Top.Collections.Pictures.Astronomy.Stars      |    97 | {2,2,3}

The first query I tried was:

with recursive
t as (select * from nodes where node_path = ANY('{Top.Science,Top.Collections}') OR node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
  select node_path, score, 
         array[row_number() over (order by score ASC, node_path)] as sort_path
    from t, l 
   where nlevel(node_path) = l.n
  union all
  select c.node_path, c.score, 
         p.sort_path||row_number() over (order by c.score ASC, c.node_path)
    from base p
    join t c 
      on subpath(c.node_path, 0, -1) = p.node_path
)
select * from base order by sort_path;

This gives me only the following data:

    node_path    | score | sort_path 
-----------------+-------+-----------
 Top.Science     |    23 | {1}
 Top.Collections |    65 | {2}
(2 rows)

I understand the = in the subpath(c.node_path, 0, -1) = p.node_path is what causes only these 2 rows to be returned.

So I changed the = to <@ as follows:

with recursive
t as (select * from nodes where node_path = ANY('{Top.Science,Top.Collections}') OR node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
  select node_path, score, 
         array[row_number() over (order by score ASC, node_path)] as sort_path
    from t, l 
   where nlevel(node_path) = l.n
  union all
  select c.node_path, c.score, 
         p.sort_path||row_number() over (order by c.score ASC, c.node_path)
    from base p
    join t c 
      on subpath(c.node_path, 0, -1) <@ p.node_path
)
select * from base order by sort_path;

This gives me the following:

                   node_path                   | score | sort_path 
-----------------------------------------------+-------+-----------
 Top.Science                                   |    23 | {1}
 Top.Collections                               |    65 | {2}
 Top.Collections.Pictures.Astronomy.Astronauts |     6 | {2,1}
 Top.Collections.Pictures.Astronomy            |    36 | {2,2}
 Top.Collections.Pictures.Astronomy.Astronauts |     6 | {2,2,1}
 Top.Collections.Pictures.Astronomy.Galaxies   |    58 | {2,2,2}
 Top.Collections.Pictures.Astronomy.Stars      |    97 | {2,2,3}
 Top.Collections.Pictures.Astronomy.Galaxies   |    58 | {2,3}
 Top.Collections.Pictures.Astronomy.Stars      |    97 | {2,4}
(9 rows)

While now all the rows are there, the children rows are repeated. I think it's due to the recursion but I am having a hard time understanding why and how to fix it.

Note that if I remove the WHERE for the top 2 rows and only query for the children ones, then it gives me the children as I expect:

with recursive
t as (select * from nodes where node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
  select node_path, score, 
         array[row_number() over (order by score ASC, node_path)] as sort_path
    from t, l 
   where nlevel(node_path) = l.n
  union all
  select c.node_path, c.score, 
         p.sort_path||row_number() over (order by c.score ASC, c.node_path)
    from base p
    join t c 
      on subpath(c.node_path, 0, -1) = p.node_path
)
select * from base order by sort_path;

One solution I do have in mind is to query the top 2 rows separately, query the children separately, then UNION then together. However, that seems unnecessarily complicated.

EDIT: My question seems similar to how to get the tree where some of the rows are missing. For example, how to get the tree of the following data:

                   node_path                   | score 
-----------------------------------------------+-------
 Top.Science                                   |    23 
 Top.Collections                               |    65 
 Top.Collections.Pictures.Astronomy            |    36 
 Top.Collections.Pictures.Astronomy.Astronauts |     6 
 Top.Collections.Pictures.Astronomy.Galaxies   |    58 
 Top.Collections.Pictures.Astronomy.Stars      |    97 

Notice how the Top.Collections.Pictures path is missing in this.

1

There are 1 best solutions below

0
JohnH On

The following doesn't produce the exact output described in the original post; however, it demonstrates an approach that can be used to selectively recurse:

WITH RECURSIVE
search_paths(search_path, get_descendents) AS (
  VALUES ('Top.Science'::lquery, false),
         ('Top.Collections'::lquery, false),
         ('*.Top.Collections.Pictures.Astronomy.*'::lquery, true)),
cte AS (SELECT n.*,
               p.get_descendents,
               array[row_number() OVER (ORDER BY nlevel(n.node_path), n.score, n.node_path)] AS sort_path
          FROM nodes n
          JOIN search_paths p
            ON n.node_path ~ p.search_path
        UNION ALL
        SELECT n.*,
               cte.get_descendents,
               cte.sort_path || array[row_number() OVER (ORDER BY nlevel(n.node_path), n.score, n.node_path)] AS sort_path
          FROM cte
          JOIN nodes n
            ON cte.get_descendents AND
               cte.node_path @> n.node_path AND
               cte.node_path <> n.node_path),
distinct_node_paths AS (
  SELECT DISTINCT ON (cte.node_path) cte.*
    FROM cte
    ORDER BY cte.node_path, cte.sort_path)
SELECT dnp.node_path, dnp.score, dnp.sort_path
  FROM distinct_node_paths dnp
  ORDER BY dnp.sort_path;

Having both .* at the end of search_path and get_descendents set to true is redundant (either is sufficient to get the descendents).