PostgreSQL SQL Query to return all graph nodes (the connected components), provided a node to query upon

1.7k Views Asked by At

In my Postgresql DB I have one or more directed acyclic graphs, represented by the following tables:

Tables:

nodes_table
    id

edges_table
    id
    parent_id -- a nodes_table id
    child_id -- a nodes_table id

Assuming I have the following data, here is an example with 3 separate graphs stored in the database. Note, there is no connection between any of these three graphs with each other:

nodes_table
1
2
3
4
5
6
7
8
9
10
11
12
13

edges_table
1, 1, 2
2, 1, 5
3, 2, 3
4, 3, 4
5, 5, 10
6, 9, 5
7, 9, 11
8, 3, 10
9, 6, 7
10, 8, 12
11, 8, 13

Output (3 separate graph structures, with the roots at the top):
 
  1   9         6       8
 / \ / \        |      / \
2   5   11      7     12 13
|   |
|   |
3   |
| \ |
4   10

Is there a way that I can get all nodes connected in any way (including through other nodes, up and down a given graph) provided a query node?

If I query any node in the leftmost graph, I want my query to return all nodes in that graph (excluding any nodes from other graphs in the database)

Maybe another way to think of this is that for this query I want to treat the graph as if it were an undirected graph, and return all nodes that have any path to the queried node.

I found this answer, which finds all of the connected subgraphs in a similar table structure, but it seems overboard for what I'm trying to accomplish. I want to return just one subgraph - specifically the one which includes the node I want to query upon. I suppose I could filter the output for the node I'm interested in, but that seems inefficient - searching the entire set of graphs and then just returning one. There has got to be a better way, but I'm still pretty new to SQL.

Here's a similar question regarding graphs in R, and an approach using NetworkX, but I really want to do this in the database and not in external code.


Edit: I also found this answer to a similar question, but it doesn't seem to work as expected. If I select node 1 as the starting point, all's good. But other nodes within the same graph don't provide the expected output of all the nodes in that graph. Here's an sqlfiddle based on that answer. Thoughts? Am I missing something?

1

There are 1 best solutions below

1
On BEST ANSWER

You can use a recursive CTE to get all related nodes.

For example:

with recursive
n as (
  select 5 as id -- starting node
 union
  select case when e.child_id = n.id then e.parent_id else e.child_id end
  from n
  join edges_table e on e.parent_id = n.id or e.child_id = n.id
)
select * from n

Result:

id 
-- 
5  
1  
10 
9  
2  
3  
11 
4  

See running example at DB Fiddle.