On one of my machines the Python module requests is extremely slow, seemingly due to a very high number of internal function calls. What could lead to such an issue?
>>> import sys, cProfile, requests, urllib3
>>> sys.version
'3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0]'
>>> requests.__version__
'2.28.1' # same issue with latest 2.29.0 release
>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
11717750 function calls (11717748 primitive calls) in 10.091 seconds
Ordered by: call count
ncalls tottime percall cumtime percall filename:lineno(function)
5220221 0.770 0.000 0.775 0.000 {method 'read' of '_io.TextIOWrapper' objects}
5220221 0.801 0.000 0.801 0.000 shlex.py:68(punctuation_chars)
468483 0.364 0.000 9.564 0.000 shlex.py:101(get_token)
401557 7.612 0.000 9.188 0.000 shlex.py:133(read_token)
200797 0.059 0.000 0.059 0.000 {method 'startswith' of 'str' objects}
66931 0.012 0.000 0.012 0.000 {method 'popleft' of 'collections.deque' objects}
66926 0.012 0.000 0.012 0.000 {method 'appendleft' of 'collections.deque' objects}
66926 0.049 0.000 0.061 0.000 shlex.py:72(push_token)
[..] # remaining calls with significantly lower numbers
>>> cProfile.run('http=urllib3.PoolManager(); http.request("GET", "http://127.0.0.1:8080")', sort='ncalls')
784 function calls (783 primitive calls) in 0.005 seconds
On another machines, running the same version of Python and tested with
same versions of requests, things look about right:
>>> cProfile.run('requests.get("http://127.0.0.1:8080")', sort='ncalls')
2432 function calls (2396 primitive calls) in 0.005 seconds
- Above tests were done using
python3 -m http.server 8080 --bind 127.0.0.1, but same results for other HTTP servers - Most of the time spent is before a TCP connection is even created by looking
at network traffic using
tcpdump - As with
urllib3, completely unrelated tools such ascurlworks just fine - While tests were made on IPv4 addresses, actively disabling IPv6 for
requestsusingrequests.packages.urllib3.util.connection.HAS_IPV6 = Falsedid not have any effect either - Tested with clean virtualenvs
Using the Python profile viewer tuna I found that
requestswill attempt to get authentication credentials from a netrc file ($HOME/.netrcor$HOME/_netrc) if no other authentication method is specified:Instead of just looking at number of calls (
ncalls), sorting by cumulative time (sort=cumtime) would also have helped highlight how the issuing ofnetrc.pyand subsequentshlex.pywere the main culprits.In my case I had a fairly large netrc file at thousands of entries. Upon trimming or removing the netrc file entirely the subsequent number of function calls shrunk accordingly.
Ideally the performance of the
netrcPython module wouldn't be such a bottleneck, as the file itself isn't all that complex, and other programs likecurlhas no issues handling it, but oh well.