How can I improve the iterative approach to be faster than recursive implementation, as usual?

41 Views Asked by At

While implementing DFS in Python using various methods and measuring their performance, I discovered that DFS using recursion is relatively faster. As far as I know, recursion usually performs worse than iteration due to context switching, but it seemed not to be the case with DFS. My analysis suggests that Python internally represents the stack structure as a list, which may cause significant time consumption. Does this mean that in Python, we cannot efficiently utilize the stack from a computer structure perspective like in C? I attempted to improve DFS using stacks by utilizing deque in Python, but recursive DFS still exhibited faster speeds.

My Code

from collections import deque
import time

def rec_dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
    visited.add(start)
    #print(start)
    for i in graph[start]:
        if i not in visited:
            rec_dfs(graph, i, visited)

def itr_dfs(graph, start):
    visited = set()
    stack = deque([start])
    while stack:
        node = stack.pop()  # 마지막 요소를 꺼내어 탐색할 노드로 설정
        if node not in visited:
            visited.add(node)
            #print(node)  # 노드를 방문했음을 출력
            # 그래프의 인접한 노드들을 역순으로 스택에 추가
            stack.extend(reversed(graph[node]))

# 그래프를 인접 리스트로 표현
graph = {
    'A': ['B', 'C'],
    'B': ['D', 'E'],
    'C': ['F', 'G'],
    'D': ['H', 'I'],
    'E': ['J', 'K'],
    'F': ['L', 'M'],
    'G': ['N', 'O'],
    'H': ['P', 'Q'],
    'I': ['R', 'S'],
    'J': ['T', 'U'],
    'K': ['V', 'W'],
    'L': ['X', 'Y'],
    'M': ['Z', 'A1'],
    'N': ['B1', 'C1'],
    'O': ['D1', 'E1'],
    'P': [],
    'Q': [],
    'R': [],
    'S': [],
    'T': [],
    'U': [],
    'V': [],
    'W': [],
    'X': [],
    'Y': [],
    'Z': [],
    'A1': [],
    'B1': [],
    'C1': [],
    'D1': [],
    'E1': []
}

# 시작 노드를 지정하여 DFS 실행
start_node = 'A'

an = input()
t1 = time.time()
for i in range(10000):
    rec_dfs(graph, 'A')
t2 = time.time()
print(t2 - t1)

ans = input()
t3 = time.time()
for i in range(10000):
    itr_dfs(graph, 'A')
t4 = time.time()
print(t4 - t3)

Result :

0.05646324157714844

0.07860183715820312

1

There are 1 best solutions below

1
trincot On

Both your implementations use a stack. The first uses the callstack, while the other uses an explicit stack. There are several differences that impact performance:

  1. The call stack is managed implicitly, using compiled code (CPython), which gives an advantage over a stack that must be managed with Python code (extend, pop)

  2. The second implementation does not translate the recursive pattern 1-to-1. It populates the stack in advance for what needs to be visited after the first child's subtree has been visited, meaning the stack footprint could be larger than in the recursive version. Besides the overhead of the reversed iterator, we need to take into account that memory allocation also costs time.

We cannot do much about the first point, so it is not expected to do better with a stack-based iterative implementation. Still, to tackle the second point, we could try to write an iterative version that more closely mimics the recursive version:

def itr_dfs2(graph, start):
    visited = set()
    stack = deque()
    current = iter([start])
    try:
        while True:
            for node in current:
                if node not in visited:
                    visited.add(node)
                    stack.append(current)
                    current = iter(graph[node])
                    break
            else:
                current = stack.pop()
    except IndexError:
        pass

Here the stack consists of iterators, not nodes. This more closely resembles the recursive implementation, where the state of the for loop iterations are part of the stack frame, and these loops get resumed one a recursive call comes back to it.

To time your code it is better to make use of timeit and take the minimum of several measurements (using timeit.repeat) so to exclude a bit the effect of other load on the machine it runs on. So I changed the time measuring code to this:

import timeit

n = 10000
r = 5
t1 = min(timeit.repeat(lambda: rec_dfs(graph, 'A'), repeat=r, number=n))
t2 = min(timeit.repeat(lambda: itr_dfs(graph, 'A'), repeat=r, number=n))
t3 = min(timeit.repeat(lambda: itr_dfs2(graph, 'A'), repeat=r, number=n))
print(t1)
print(t2)
print(t3)

The results I got showed that t3 was often (but not always) somewhere between t1 and t2.