Called with n = 10**8, the simple loop is consistently significantly slower for me than the complex one, and I don't see why:
def simple(n):
while n:
n -= 1
def complex(n):
while True:
if not n:
break
n -= 1
Some times in seconds:
simple 4.340795516967773
complex 3.6490490436553955
simple 4.374553918838501
complex 3.639145851135254
simple 4.336690425872803
complex 3.624480724334717
Python: 3.11.4 (main, Sep 9 2023, 15:09:21) [GCC 13.2.1 20230801]
Here's the looping part of the bytecode as shown by dis.dis(simple):
6 >> 6 LOAD_FAST 0 (n)
8 LOAD_CONST 1 (1)
10 BINARY_OP 23 (-=)
14 STORE_FAST 0 (n)
5 16 LOAD_FAST 0 (n)
18 POP_JUMP_BACKWARD_IF_TRUE 7 (to 6)
And for complex:
10 >> 4 LOAD_FAST 0 (n)
6 POP_JUMP_FORWARD_IF_TRUE 2 (to 12)
11 8 LOAD_CONST 0 (None)
10 RETURN_VALUE
12 >> 12 LOAD_FAST 0 (n)
14 LOAD_CONST 2 (1)
16 BINARY_OP 23 (-=)
20 STORE_FAST 0 (n)
9 22 JUMP_BACKWARD 10 (to 4)
So it looks like the complex one does more work per iteration (two jumps instead of one). Then why is it faster?
Seems to be a Python 3.11 phenomenon, see the comments.
Benchmark script (Attempt This Online!):
from time import time
import sys
def simple(n):
while n:
n -= 1
def complex(n):
while True:
if not n:
break
n -= 1
for f in [simple, complex] * 3:
t = time()
f(10**8)
print(f.__name__, time() - t)
print('Python:', sys.version)
I checked the source code of the bytecode (python 3.11.6) and found that in the decompiled bytecode, it seems that only
JUMP_BACKWARDwill execute a warmup function, which will trigger specialization in python 3.11 when executed enough times:Among all bytecodes, only
JUMP_BACKWARDandRESUMEwill call_PyCode_Warmup().Specialization appears to speed up multiple bytecodes used, resulting in a significant increase in speed:
After executing once, the bytecode of
complexchanged, whilesimpledid not: