While doing some optimization tests, I decided to do a comparison of two Python profilers: Yappi and cProfile. I did so with the following python script:
import yappi
import cProfile
import string
import random
def setup():
strlist = []
for i in range(1000000):
strlist.append(''.join(random.choices(string.ascii_lowercase, k=10)))
return strlist
def execute(strlist):
new_strlist = []
for i in strlist:
new_strlist.append(i.upper())
return new_strlist
def main():
execute(setup())
yappi.set_clock_type("cpu")
yappi.start()
main()
yappi.stop()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
cProfile.run("main()")
I didn't expect them to be the same, as Yappi was measuring CPU instead of wall time, but Yappi suddenly started causing horrible slowdown. To demonstrate, here's the output of the code above:
Clock type: CPU
Ordered by: totaltime, desc
name ncall tsub ttot tavg
..p_optimization_baseline.py:20 main 1 0.042419 48.78300 48.78300
..p_optimization_baseline.py:7 setup 1 4.765043 43.22398 43.22398
..on3.8/random.py:386 Random.choices 100.. 4.687804 35.68567 0.000036
..python3.8/random.py:399 <listcomp> 100.. 16.40414 29.65616 0.000030
..ptimization_baseline.py:14 execute 1 2.818642 5.516596 5.516596
name id tid ttot scnt
_MainThread 0 140527431976768 **48.78303** 1
17000006 function calls in **4.693 seconds**
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.693 4.693 <string>:1(<module>)
1 0.182 0.182 0.342 0.342 loop_optimization_baseline.py:14(execute)
1 0.036 0.036 4.693 4.693 loop_optimization_baseline.py:20(main)
1 0.751 0.751 4.315 4.315 loop_optimization_baseline.py:7(setup)
1000000 0.674 0.000 3.332 0.000 random.py:386(choices)
1000000 1.968 0.000 2.580 0.000 random.py:399(<listcomp>)
1 0.000 0.000 4.693 4.693 {built-in method builtins.exec}
1000000 0.078 0.000 0.078 0.000 {built-in method builtins.len}
2000000 0.128 0.000 0.128 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000000 0.159 0.000 0.159 0.000 {method 'join' of 'str' objects}
10000000 0.612 0.000 0.612 0.000 {method 'random' of '_random.Random' objects}
1000000 0.106 0.000 0.106 0.000 {method 'upper' of 'str' objects}
And yes, it really was taking about 50 seconds IRL. When I set Yappi back to wall time, it started taking about 9 seconds instead. What is happening?
I am using Ubuntu 20.04, Python 3.8, and PyCharm Community Edition 2023.2.2. Yappi was installed through PyCharm's packages manager and is version 1.4.0, which is the latest.