My actual use case is similar to how comment threads work on Reddit for example. If I do the query based on the LTREE path for the post itself, then it works fine and brings all the child comments. But if I do the query on permalink of a nested child comment, then I want it to return both the parent post and also all the children from that child comment onwards.
My actual query is fairly complex but I am able to demonstrate the issue with this simple example.
I have the following table:
CREATE TABLE nodes (node_path ltree,score INTEGER);
INSERT INTO nodes VALUES ('Top',150);
INSERT INTO nodes VALUES ('Top.Science',23);
INSERT INTO nodes VALUES ('Top.Science.Astronomy',5);
INSERT INTO nodes VALUES ('Top.Science.Astronomy.Astrophysics',43);
INSERT INTO nodes VALUES ('Top.Science.Astronomy.Cosmology',23);
INSERT INTO nodes VALUES ('Top.Hobbies',16);
INSERT INTO nodes VALUES ('Top.Hobbies.Amateurs_Astronomy',2);
INSERT INTO nodes VALUES ('Top.Collections',65);
INSERT INTO nodes VALUES ('Top.Collections.Pictures',88);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy',36);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Stars',97);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Galaxies',58);
INSERT INTO nodes VALUES ('Top.Collections.Pictures.Astronomy.Astronauts',6);
INSERT INTO nodes VALUES ('Top.Dislikes',28);
CREATE INDEX ON nodes (score);
CREATE INDEX ON nodes USING GIST (node_path);
CREATE INDEX ON nodes USING BTREE (node_path);
I need to query the data such that Top.Science and Top.Collections nodes are present plus I also want all the nodes which are or are descendants of the path *.Top.Collections.Pictures.Astronomy.*. I basically want it to not do recursion for the top 2 rows and do it for the children ones.
The correct output I want is:
node_path | score | sort_path
-----------------------------------------------+-------+-----------
Top.Science | 23 | {1}
Top.Collections | 65 | {2}
Top.Collections.Pictures.Astronomy | 36 | {2,2}
Top.Collections.Pictures.Astronomy.Astronauts | 6 | {2,2,1}
Top.Collections.Pictures.Astronomy.Galaxies | 58 | {2,2,2}
Top.Collections.Pictures.Astronomy.Stars | 97 | {2,2,3}
The first query I tried was:
with recursive
t as (select * from nodes where node_path = ANY('{Top.Science,Top.Collections}') OR node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
select node_path, score,
array[row_number() over (order by score ASC, node_path)] as sort_path
from t, l
where nlevel(node_path) = l.n
union all
select c.node_path, c.score,
p.sort_path||row_number() over (order by c.score ASC, c.node_path)
from base p
join t c
on subpath(c.node_path, 0, -1) = p.node_path
)
select * from base order by sort_path;
This gives me only the following data:
node_path | score | sort_path
-----------------+-------+-----------
Top.Science | 23 | {1}
Top.Collections | 65 | {2}
(2 rows)
I understand the = in the subpath(c.node_path, 0, -1) = p.node_path is what causes only these 2 rows to be returned.
So I changed the = to <@ as follows:
with recursive
t as (select * from nodes where node_path = ANY('{Top.Science,Top.Collections}') OR node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
select node_path, score,
array[row_number() over (order by score ASC, node_path)] as sort_path
from t, l
where nlevel(node_path) = l.n
union all
select c.node_path, c.score,
p.sort_path||row_number() over (order by c.score ASC, c.node_path)
from base p
join t c
on subpath(c.node_path, 0, -1) <@ p.node_path
)
select * from base order by sort_path;
This gives me the following:
node_path | score | sort_path
-----------------------------------------------+-------+-----------
Top.Science | 23 | {1}
Top.Collections | 65 | {2}
Top.Collections.Pictures.Astronomy.Astronauts | 6 | {2,1}
Top.Collections.Pictures.Astronomy | 36 | {2,2}
Top.Collections.Pictures.Astronomy.Astronauts | 6 | {2,2,1}
Top.Collections.Pictures.Astronomy.Galaxies | 58 | {2,2,2}
Top.Collections.Pictures.Astronomy.Stars | 97 | {2,2,3}
Top.Collections.Pictures.Astronomy.Galaxies | 58 | {2,3}
Top.Collections.Pictures.Astronomy.Stars | 97 | {2,4}
(9 rows)
While now all the rows are there, the children rows are repeated. I think it's due to the recursion but I am having a hard time understanding why and how to fix it.
Note that if I remove the WHERE for the top 2 rows and only query for the children ones, then it gives me the children as I expect:
with recursive
t as (select * from nodes where node_path ~ '*.Top.Collections.Pictures.Astronomy.*'::lquery),
l as (select nlevel(node_path) n from t order by n limit 1),
base as (
select node_path, score,
array[row_number() over (order by score ASC, node_path)] as sort_path
from t, l
where nlevel(node_path) = l.n
union all
select c.node_path, c.score,
p.sort_path||row_number() over (order by c.score ASC, c.node_path)
from base p
join t c
on subpath(c.node_path, 0, -1) = p.node_path
)
select * from base order by sort_path;
One solution I do have in mind is to query the top 2 rows separately, query the children separately, then UNION then together. However, that seems unnecessarily complicated.
EDIT: My question seems similar to how to get the tree where some of the rows are missing. For example, how to get the tree of the following data:
node_path | score
-----------------------------------------------+-------
Top.Science | 23
Top.Collections | 65
Top.Collections.Pictures.Astronomy | 36
Top.Collections.Pictures.Astronomy.Astronauts | 6
Top.Collections.Pictures.Astronomy.Galaxies | 58
Top.Collections.Pictures.Astronomy.Stars | 97
Notice how the Top.Collections.Pictures path is missing in this.
The following doesn't produce the exact output described in the original post; however, it demonstrates an approach that can be used to selectively recurse:
Having both
.*at the end ofsearch_pathandget_descendentsset totrueis redundant (either is sufficient to get the descendents).