While implementing DFS in Python using various methods and measuring their performance, I discovered that DFS using recursion is relatively faster. As far as I know, recursion usually performs worse than iteration due to context switching, but it seemed not to be the case with DFS. My analysis suggests that Python internally represents the stack structure as a list, which may cause significant time consumption. Does this mean that in Python, we cannot efficiently utilize the stack from a computer structure perspective like in C? I attempted to improve DFS using stacks by utilizing deque in Python, but recursive DFS still exhibited faster speeds.
My Code
from collections import deque
import time
def rec_dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
#print(start)
for i in graph[start]:
if i not in visited:
rec_dfs(graph, i, visited)
def itr_dfs(graph, start):
visited = set()
stack = deque([start])
while stack:
node = stack.pop() # 마지막 요소를 꺼내어 탐색할 노드로 설정
if node not in visited:
visited.add(node)
#print(node) # 노드를 방문했음을 출력
# 그래프의 인접한 노드들을 역순으로 스택에 추가
stack.extend(reversed(graph[node]))
# 그래프를 인접 리스트로 표현
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': ['H', 'I'],
'E': ['J', 'K'],
'F': ['L', 'M'],
'G': ['N', 'O'],
'H': ['P', 'Q'],
'I': ['R', 'S'],
'J': ['T', 'U'],
'K': ['V', 'W'],
'L': ['X', 'Y'],
'M': ['Z', 'A1'],
'N': ['B1', 'C1'],
'O': ['D1', 'E1'],
'P': [],
'Q': [],
'R': [],
'S': [],
'T': [],
'U': [],
'V': [],
'W': [],
'X': [],
'Y': [],
'Z': [],
'A1': [],
'B1': [],
'C1': [],
'D1': [],
'E1': []
}
# 시작 노드를 지정하여 DFS 실행
start_node = 'A'
an = input()
t1 = time.time()
for i in range(10000):
rec_dfs(graph, 'A')
t2 = time.time()
print(t2 - t1)
ans = input()
t3 = time.time()
for i in range(10000):
itr_dfs(graph, 'A')
t4 = time.time()
print(t4 - t3)
Result :
0.05646324157714844
0.07860183715820312
Both your implementations use a stack. The first uses the callstack, while the other uses an explicit stack. There are several differences that impact performance:
The call stack is managed implicitly, using compiled code (CPython), which gives an advantage over a stack that must be managed with Python code (
extend,pop)The second implementation does not translate the recursive pattern 1-to-1. It populates the stack in advance for what needs to be visited after the first child's subtree has been visited, meaning the stack footprint could be larger than in the recursive version. Besides the overhead of the
reversediterator, we need to take into account that memory allocation also costs time.We cannot do much about the first point, so it is not expected to do better with a stack-based iterative implementation. Still, to tackle the second point, we could try to write an iterative version that more closely mimics the recursive version:
Here the stack consists of iterators, not nodes. This more closely resembles the recursive implementation, where the state of the
forloop iterations are part of the stack frame, and these loops get resumed one a recursive call comes back to it.To time your code it is better to make use of
timeitand take the minimum of several measurements (usingtimeit.repeat) so to exclude a bit the effect of other load on the machine it runs on. So I changed the time measuring code to this:The results I got showed that
t3was often (but not always) somewhere betweent1andt2.