Trying to find the longest induced path using DFS in a graph

575 Views Asked by At

here is the code I am currently applying that is a DFS algorithm in order to find the longest induced path (the longest 'snake') of a graph:

def dfs(node, adj, dp, vis): 
    # Mark as visited
    vis[node] = True
    #print(vis[adj[node][0]])
    # Traverse for all its children
    for i in range(0, len(adj[node])):
        # If not visited
        if vis[adj[node][i]]==False:
            dfs(adj[node][i], adj, dp, vis) 
        # Store the max of the paths
        dp[node] = max(dp[node], [node] + dp[adj[node][i]], key=lambda x: len(x))
    print(dp[node])
        
        
# Function that returns the longest path 
def findLongestPath(adj, n): 
    # Dp array 
    dp = [[x] for x in range(n)]
    # Visited array to know if the node 
    # has been visited previously or not 
    vis = [False] * (n)
    # Call DFS for every unvisited vertex 
    for i in range(n):
        if vis[i]==False: 
            dfs(i, adj, dp, vis) 
    # Traverse and find the maximum of all dp[i]
    return max([dp[i] for i in range(n)], key=lambda x: len(x))


# The adjacency list of the graph
adj = [[1, 4], [0, 2], [1, 3, 4], [2, 4], [0, 2, 3]]

final = findLongestPath(adj, len(adj))
print(final)

The main issue I assume remains in my DFS algorithm:

def dfs(node, adj, dp, vis): 

    # Mark as visited
    vis[node] = True
    #print(vis[adj[node][0]])
    # Traverse for all its children
    for i in range(0, len(adj[node])):
        # If not visited
        if vis[adj[node][i]]==False:
            dfs(adj[node][i], adj, dp, vis) 
        # Store the max of the paths
        dp[node] = max(dp[node], [node] + dp[adj[node][i]], key=lambda x: len(x))
    print(dp[node])

I tried to print all the longest paths starting from each node at the end, as you can see print(dp[node]), and here are the outputs:

[4, 3, 2, 1, 0]
[3, 4, 3, 2, 1, 0]
[2, 3, 4, 3, 2, 1, 0]
[1, 2, 3, 4, 3, 2, 1, 0]
[0, 1, 2, 3, 4, 3, 2, 1, 0]

Although this is nice, seeing all the longest paths starting from each node, but this is not what I wanted as I want the longest induced paths. Such as the longest induced path for this graph, that one example might be:

[0, 1, 2, 3]

Instead of the one, I got in my output:

[0, 1, 2, 3, 4, 3, 2, 1, 0]

Node 4 is not included because if we include 4, there will be an edge from 0-4 which wouldn't make an induced path.

Is there a way to modify my Dfs algorithm that would output only the longest induced paths instead of all the longest paths? What should I change in my Dfs function? Thanks!

1

There are 1 best solutions below

11
Grismar On

From your question and comment it's not clear if you correctly understand what a 'longest induced path' is, so here's an example I wrote that can get you either the first longest induced path found or all of them (which is convenient to get a sense of what it is):

def longest_induced_path(adj, all_longest=False):
    def _dfs(node, path, off_limits):
        yield (path := path + [node])
        for next_node in adj[node]:
            if next_node not in off_limits:
                yield from _dfs(next_node, path, off_limits | set(adj[node]))

    def _first():
        lp = []
        for n in range(len(adj)):
            for p in _dfs(n, [], {n}):
                if len(p) > len(lp):
                    lp = p
        return lp

    def _all():
        lp = []
        for n in range(len(adj)):
            for p in _dfs(n, [], {n}):
                if not lp or len(p) > len(lp[0]):
                    lp = [p]
                elif len(p) == len(lp[0]):
                    lp.append(p)
        return lp

    return _all() if all_longest else _first()


def main():
    adj = [{1, 4}, {0, 2}, {1, 3, 4}, {2, 4}, {0, 2, 3}]

    print(longest_induced_path(adj))
    print(longest_induced_path(adj, all_longest=True))


main()

Output:

[0, 1, 2, 3]
[[0, 1, 2, 3], [1, 0, 4, 3], [3, 2, 1, 0], [3, 4, 0, 1]]

The graph looks something like this:

0 - 4 - 3
|   | /
1 - 2 

So, you can see that it finds every longest path P in the graph G so that:

  • each vertex A that's a neighbour of any vertex B in P, is also the neighbour of B in G;
  • no two vertices A and B in P are neighbours in G unless they are also neighbours in P.

As an example of that second constraint in action(making P a longest induced path): [0, 1, 2, 4, 3] is not an induced path, because both 0 and 4 are in it, but they are neighbours in G, while they are not neighbours in P, which the constraint forbids.

Of course, if you only require the result of _first(), you can just simplify your code - I included _all() and the option to call it to provide the example of all longest paths.

A different (if perhaps harder to read) version of _dfs:

    def _dfs(node, path, off_limits):
        if adj[node].issubset(off_limits):
            yield path + [node]
        else:
            for next_node in adj[node] - off_limits:
                yield from _dfs(next_node, path + [node], off_limits | set(adj[node]))

It's arguable simpler and prevents more calls to itself, but performs almost identical to the other implementation when timed. It's possible that for more complicated graphs one wins out over the other, but you get to find out which yourself.

Edit: updated initial value for off_limits to {n} instead of {set} to avoid cyclical return values.

Note: when using a version of Python before 3.8, you want this as the first lines of _dfs:

        path = path + [node]
        yield path