Graph analysis: Identify loop paths

2.5k Views Asked by At

I have an undirected graph like:

import igraph as ig
G = ig.Graph()
G.add_vertices(9)
G.add_edges([(0,1), (1,2),(2,3),(3,0),(0,4),(4,5),(5,6),(6,7),(6,8)])

There is a "loop path" from node 0 via 1,2,3 back to 0 (sorry, in the pic the nodelabels are starting with 1 instead of 0)

Graph

For an assignment, I need to identify the node where the "loop path" is connected to the rest of the graph, i.e. 0, and most importantly, the "loop path" itself, i.e. [0,1,2,3,0] and/or [0,3,2,1,0]

I am trying hard, but really have no clue. If I use the function from here like find_all_paths(G,0,0) I of course get only [0]

4

There are 4 best solutions below

2
On BEST ANSWER

OK, this is part one of the answer to my own question:

Thanks to Max Li's and deeenes' help, ihad the idea to rewrite the networkx cycle_basis function to work in python_igraph:

import igraph as ig
G = ig.Graph()
G.add_vertices(9)
G.add_edges([(0,1), (1,2),(2,3),(3,0),(0,4),(4,5),(5,6),(6,7),(6,8)])

def cycle_basis_ig(G,root=None):
gnodes=set(n.index for n in G.vs())
cycles=[]
while gnodes:  # loop over connected components
    if root is None:
        root=gnodes.pop()
    stack=[root]
    pred={root:root} 
    used={root:set()}
    while stack:  # walk the spanning tree finding cycles
        z=stack.pop() # use last-in so cycles easier to find
        zused=used[z]
        for nbr in G.neighbors(z,mode='ALL'):
            if nbr not in used:   # new node 
                pred[nbr]=z
                stack.append(nbr)
                used[nbr]=set([z])
            elif nbr is z:        # self loops
                cycles.append([z]) 
            elif nbr not in zused:# found a cycle
                pn=used[nbr]
                cycle=[nbr,z]
                p=pred[z]
                while p not in pn:
                    cycle.append(p)
                    p=pred[p]
                cycle.append(p)
                cycles.append(cycle)
                used[nbr].add(z)
    gnodes-=set(pred)
    root=None
return cycles


cb = cycle_basis_ig(G)
print 'cycle_basis_ig: ', cb
0
On

Since the question was also tagged with networkx, I use it to exemplify the code.

In graph theory "loop paths" are usually called cycles.

The simplest (probably not the fastest) idea I see is to find the cycles and the set of articulation points (or cut verteces, i.e. points that increase the number of connected components) and then their intersection would be the solution.

To start on the same basis:

import networkx as nx
G.add_nodes_from([9])
G.add_edges_from([(0,1), (1,2),(2,3),(3,0),(0,4),(4,5),(5,6),(6,7),(6,8)])

Now the problem's solution:

cycles = nx.cycle_basis(G) # list of cycles
cuts = list(nx.articulation_points(G)) # list of cut verteces
nodes_needed = set() # the set of nodes we are looking for
for cycle in cycles:
    for node in cycle:
        if node in cuts:
            nodes_needed.add(node)
2
On

Here is an example to find the loops with breadth first search. I am wondering if more efficient method exists. In case of even moderately large graphs, or longer maximum loop length, this might run for ages. Same could be done with depth first search. First I believed you posted the question using R, so find below also the R version. The python version is not perfectly pythonic for the same reason, as I quickly translated from R. For explanation, see the comments in the code.

import igraph

# creating a toy graph
g = igraph.Graph.Erdos_Renyi(n = 100, p = 0.04)

# breadth first search of paths and unique loops
def get_loops(adj, paths, maxlen):
    # tracking the actual path length:
    maxlen -= 1
    nxt_paths = []
    # iterating over all paths:
    for path in paths['paths']:
        # iterating neighbors of the last vertex in the path:
        for nxt in adj[path[-1]]:
            # attaching the next vertex to the path:
            nxt_path = path + [nxt]
            if path[0] == nxt and min(path) == nxt:
                # the next vertex is the starting vertex, we found a loop
                # we keep the loop only if the starting vertex has the 
                # lowest vertex id, to avoid having the same loops 
                # more than once
                paths['loops'].append(nxt_path)
                # if you don't need the starting vertex 
                # included at the end:
                # paths$loops <- c(paths$loops, list(path))
            elif nxt not in path:
                # keep the path only if we don't create 
                # an internal loop in the path
                nxt_paths.append(nxt_path)
    # paths grown by one step:
    paths['paths'] = nxt_paths
    if maxlen == 0:
        # the final return when maximum search length reached
        return paths
    else:
        # recursive return, to grow paths further
        return get_loops(adj, paths, maxlen)

adj = []
loops = []
# the maximum length to limit computation time on large graphs
# maximum could be vcount(graph), but that might take for ages
maxlen = 4

# creating an adjacency list
# for directed graphs use the 'mode' argument of neighbors() 
# according to your needs ('in', 'out' or 'all')
adj = [[n.index for n in v.neighbors()] for v in g.vs]

# recursive search of loops 
# for each vertex as candidate starting point
for start in xrange(g.vcount()):
    loops += get_loops(adj, 
        {'paths': [[start]], 'loops': []}, maxlen)['loops']

Same in R:

require(igraph)

# creating a toy graph
g <- erdos.renyi.game(n = 100, p.or.m = 0.04)

# breadth first search of paths and unique loops
get_loops <- function(adj, paths, maxlen){
    # tracking the actual path length:
    maxlen <- maxlen - 1
    nxt_paths <- list()
    # iterating over all paths:
    for(path in paths$paths){
        # iterating neighbors of the last vertex in the path:
        for(nxt in adj[[path[length(path)]]]){
            # attaching the next vertex to the path:
            nxt_path <- c(path, nxt)
            if(path[1] == nxt & min(path) == nxt){
                # the next vertex is the starting vertex, we found a loop
                # we keep the loop only if the starting vertex has the 
                # lowest vertex id, to avoid having the same loops 
                # more than once
                paths$loops <- c(paths$loops, list(nxt_path))
                # if you don't need the starting vertex included 
                # at the end:
                # paths$loops <- c(paths$loops, list(path))
            }else if(!(nxt %in% path)){
                # keep the path only if we don't create 
                # an internal loop in the path
                nxt_paths <- c(nxt_paths, list(nxt_path))
            }
        }
    }
    # paths grown by one step:
    paths$paths <- nxt_paths
    if(maxlen == 0){
        # the final return when maximum search length reached
        return(paths)
    }else{
        # recursive return, to grow paths further
        return(get_loops(adj, paths, maxlen))
    }
}

adj <- list()
loops <- list()
# the maximum length to limit computation time on large graphs
# maximum could be vcount(graph), but that might take for ages
maxlen <- 4

# creating an adjacency list
for(v in V(g)){
    # for directed graphs use the 'mode' argument of neighbors() 
    # according to your needs ('in', 'out' or 'all')
    adj[[as.numeric(v)]] <- neighbors(g, v)
}

# recursive search of loops 
# for each vertex as candidate starting point
for(start in seq(length(adj))){
    loops <- c(loops, get_loops(adj, list(paths = list(c(start)), 
        loops = list()), maxlen)$loops)
}
0
On

For big graph and directed graph @deeenes' answer is correct, and the python version is O.K., but the R version has a bottleneck with copying lists times and times again, I address the performance problem by modifying it in this way:

# breadth first search of paths and unique loops
get_loops <- function(adj, paths, maxlen) {
  # tracking the actual path length:
  maxlen <- maxlen - 1
  nxt_paths <- list()
  # count of loops and paths in the next step, avoid copying lists that cause        performance bottleneck.
  if (is.null(paths$lc))
    paths$lc <- 0
  paths$pc <- 0
  # iterating over all paths:
  for (path in paths$paths) {
    # iterating neighbors of the last vertex in the path:
    for (nxt in adj[[path[length(path)]]]) {
      # attaching the next vertex to the path:
      nxt_path <- c(path, nxt)
      if (path[1] == nxt & min(path) == nxt) {
        # the next vertex is the starting vertex, we found a loop
        # we keep the loop only if the starting vertex has the
        # lowest vertex id, to avoid having the same loops
        # more than once
        paths$lc <- paths$lc + 1
        paths$loops[paths$lc] <- list(nxt_path)
        # if you don't need the starting vertex included
        # at the end:
        # paths$loops <- c(paths$loops, list(path))
        # cat(paste(paths$loops,collapse=","));cat("\n")
      } else if (!(nxt %in% path)) {
        # keep the path only if we don't create
        # an internal loop in the path
        paths$pc <- paths$pc + 1
        nxt_paths[paths$pc] <- list(nxt_path)
      }
    }
  }
  # paths grown by one step:
  paths$paths <- nxt_paths
  if (maxlen == 0) {
    # the final return when maximum search length reached
    return(paths)
  } else{
    # recursive return, to grow paths further
    return(get_loops(adj, paths, maxlen))
  }
}