I am currently writing a RESTful API in Python (Flask) with uWSGI as the WSGI and NGINX as a reverse proxy.
The principle is simple:
- The client POST request gets to the server with a small JSON object in its body
- Through NGINX and then uWSGI, the request gets to my Python callable, which in turn calls a backend subprocess to handle the data processing
- The backend data processing is all CPU-bound, no IO. For a single JSON object, the raw data processing time when the server is idle is about 800 milliseconds
- The backend returns the raw data, which is then serialized in JSON and returned by the Python callable back to uWSGI then to NGINX and finally to the client
The server has 8 single-core single-thread CPUs.
NGINX is configured to have an automatic number of listeners (so it gets to 8 in this case), and pretty vanilla overall.
Here's the uWSGI configuration:
[uwsgi]
strict = true
need-app = true
master = true
processes = 8
single-interpreter = true
enable-threads = false
max-requests = 1000
max-worker-lifetime = 7200
reload-on-rss = 1024
worker-reload-mercy = 60
harakiri = 60
disable-logging = true
log-4xx = true
log-5xx = true
logto = uwsgi.log
vacuum = true
Running a load test with Jmeter -- 16 to 32 threads --, the throughput of this API seems to plateau at 7 to 8 requests / second, each individual request taking up to >4000ms latency (time to get a response at maximum load). During that test, all 8 CPUs of the server were used.
Increasing the number of uWSGI processes (either fixed or with cheaper and busyness configs) did seem to marginally harm the throughput and latency -- in any case, certainly didn't gain anything there.
Is this behaviour (especially the individual requests skyrocketing from 800ms to >4000ms latency) normal given my hardware and software configurations or am I missing something ?