I have the following code:
import time
from fastapi import FastAPI, Request
app = FastAPI()
@app.get("/ping")
async def ping(request: Request):
print("Hello")
time.sleep(5)
print("bye")
return {"ping": "pong!"}
If I run my code on localhost - e.g., http://localhost:8501/ping
- in different tabs of the same browser window, I get:
Hello
bye
Hello
bye
instead of:
Hello
Hello
bye
bye
I have read about using httpx
, but still, I cannot have a true parallelization. What's the problem?
As per FastAPI's documentation:
also, as described here:
Thus, in order to avoid blocking the server,
def
endpoints—in the context of asynchronous programming, a function defined with justdef
is called synchronous function—in FastAPI, will run in a separate thread from an external threadpool that is thenawait
ed (more details on the external threadpool are given later on), and hence, FastAPI will still work asynchronously. In other words, the server will process requests to such endpoints concurrently (but will spawn a new thread for every incoming request). Whereas,async def
endpoints run directly in theevent loop
—which runs in the main (single) thread, and is created when calling, for instance,uvicorn.run()
, or the equivalent method of some other ASGI server—that is, the server will also process requests to such endpoints concurrently/asynchronously, as long as there is anawait
call to non-blocking I/O-bound operations inside suchasync def
endpoints/routes, such as waiting for (1) data from the client to be sent through the network, (2) contents of a file in the disk to be read, (3) a database operation to finish, etc., (have a look here).However, if an endpoint defined with
async def
does notawait
for some coroutine inside (i.e., a coroutine object is the result of calling anasync def
function), in order to give up time for other tasks in theevent loop
to run (e.g., requests to the same or other endpoints, background tasks, etc.), each request to such an endpoint will have to be completely finished (i.e., exit the endpoint), before returning control back to theevent loop
and allowing other tasks in theevent loop
to run (see this answer, if you would like to get and monitor all pending tasks in anevent loop
). In other words, in such cases, the server would be "blocked" and have to process requests sequentially. Having said that, you should still define an endpoint withasync def
, if your endpoint does not have to execute a blocking operation inside and wait for it to respond, but is instead used to simply return JSON data,FileResponse
,HTMLResponse
, etc., even if there is not anawait
statement inside the endpoint in such cases, as FastAPI would likely perform better when running such a simple endpoint directly in the event loop, rather than spawning a new thread from the external threadpool (in case the endpoint was instead defined with normaldef
).Note that the same concept not only applies to endpoints, but also to functions that are used as
StreamingResponse
's generators (seeStreamingResponse
class implementation) orBackground Tasks
(seeBackgroundTask
class implementation and this answer), meaning that FastAPI, behind the scenes, will also run such functions defined with normaldef
in a separate thread from the same external threadpool; whereas, if such functions were defined withasync def
instead, they would run directly in theevent loop
. In order to run an endpoint or a function described above in a separate thread andawait
it, FastAPI uses Starlette's asynchronousrun_in_threadpool()
function, which, under the hood, callsanyio.to_thread.run_sync()
. The default number of worker threads of that external threadpool is40
and can be adjusted as required—please have a look at this answer for more details on the external threadpool and how to adjust the number of threads. Hence, after reading this answer to the end, you should be able to decide whether you should define a FastAPI endpoint,StreamingResponse
's generator orBackgroundTask
function withdef
orasync def
.Python's
async def
function andawait
The keyword
await
(which only works within anasync def
function) passes function control back to theevent loop
. In other words, it suspends the execution of the surrounding coroutine, and tells theevent loop
to let some other task run, until thatawait
ed task is completed. Note that just because you may define a custom function withasync def
and thenawait
it inside yourasync def
endpoint, it doesn't mean that your code will work asynchronously, if that custom function contains, for example, calls totime.sleep()
, CPU-bound tasks, non-async I/O libraries, or any other blocking call that is incompatible with asynchronous Python code. In FastAPI, for example, when using theasync
methods ofUploadFile
, such asawait file.read()
andawait file.write()
, FastAPI/Starlette, behind the scenes, actually calls the corresponding synchronous File methods in a separate thread from the external threadpool described earlier (usingrun_in_threadpool()
) andawait
s it; otherwise, such methods/operations would block theevent loop
—you could find out more by looking at the implementation of theUploadFile
class.Note that
async
does not mean parallel, but concurrently. As mentioned earlier, asynchronous code withasync
andawait
is many times summarised as using coroutines. Coroutines are collaborative (or cooperatively multitasked), meaning that "at any given time, a program with coroutines is running only one of its coroutines, and this running coroutine suspends its execution only when it explicitly requests to be suspended" (see here and here for more info on coroutines).As described in this article:
If, however, a blocking I/O-bound or CPU-bound operation was directly executed/called inside an
async def
function/endpoint, it would then block the event loop, and hence, the main thread would be blocked as well (theevent loop
runs in the main thread of a process/worker). Hence, a blocking operation such astime.sleep()
in anasync def
endpoint would block the entire server (as in the code example provided in your question). Thus, if your endpoint is not going to make anyasync
calls, you could declare it with normaldef
instead, in which case, FastAPI would run it in a separate thread from the external threadpool andawait
it, as explained earlier (more solutions are given in the following sections). Example:Otherwise, if the functions that you had to execute inside the endpoint are
async
functions that you had toawait
, you should define your endpoint withasync def
. To demonstrate this, the example below uses theasyncio.sleep()
function (from theasyncio
library), which provides a non-blocking sleep operation. Theawait asyncio.sleep()
method will suspend the execution of the surrounding coroutine (until the sleep operation is completed), thus allowing other tasks in theevent loop
to run. Similar examples are given here and here as well.Both the endpoints above will print out the specified messages to the screen in the same order as mentioned in your question—if two requests arrived at (around) the same time—that is:
Important Note
When using a browser to call the same endpoint for the second (third, and so on) time, please remember to do that from a tab that is isolated from the browser's main session; otherwise, succeeding requests (i.e., coming after the first one) might be blocked by the browser (on client side), as the browser might be waiting for a response to the previous request from the server, before sending the next request. This is a common behaviour for Chrome browser at least, due to waiting to see the result of a request and check if the result can be cached, before requesting the same resource again.
You could confirm that by using
print(request.client)
inside the endpoint, where you would see thehostname
andport
number being the same for all incoming requests—in case the requests were initiated from tabs opened in the same browser window/session; otherwise, theport
number would normally be different for every request—and hence, those requests would be processed sequentially by the server, because of the browser/client sending them sequentially in the first place. To overcome this, you could either:Reload the same tab (as is running), or
Open a new tab in an Incognito Window, or
Use a different browser/client to send the request, or
Use the
httpx
library to make asynchronous HTTP requests, along with the awaitableasyncio.gather()
, which allows executing multiple asynchronous operations concurrently and then returns a list of results in the same order the awaitables (tasks) were passed to that function (have a look at this answer for more details).Example:
In case you had to call different endpoints that may take different time to process a request, and you would like to print the response out on client side as soon as it is returned from the server—instead of waiting for
asyncio.gather()
to gather the results of all tasks and print them out in the same order the tasks were passed to thesend()
function—you could replace thesend()
function of the example above with the one shown below:Async
/await
and Blocking I/O-bound or CPU-bound OperationsIf you are required to define a FastAPI endpoint (or a
StreamingResponse
's generator, or a background task function) withasync def
(as you might need toawait
for some coroutines inside it), but also have some synchronous blocking I/O-bound or CPU-bound operation (computationally intensive task) that would block theevent loop
(essentially, the entire server) and wouldn't let other requests to go through, for example:then:
You should check whether you could change your endpoint's definition to normal
def
instead ofasync def
. For example, if the only method in your endpoint that has to be awaited is the one reading the file contents (as you mentioned in the comments section below), you could instead declare the type of the endpoint's parameter asbytes
(i.e.,file: bytes = File()
) and thus, FastAPI would read the file for you and you would receive the contents asbytes
. Hence, there would be no need to useawait file.read()
. Please note that the above approach should work for small files, as the enitre file contents would be stored into memory (see the documentation onFile
Parameters); and hence, if your system does not have enough RAM available to accommodate the accumulated data (if, for example, you have 8GB of RAM, you can’t load a 50GB file), your application may end up crashing. Alternatively, you could call the.read()
method of theSpooledTemporaryFile
directly (which can be accessed through the.file
attribute of theUploadFile
object), so that again you don't have toawait
the.read()
method—and as you can now declare your endpoint with normaldef
, each request will run in a separate thread (example is given below). For more details on how to upload aFile
, as well how Starlette/FastAPI usesSpooledTemporaryFile
behind the scenes, please have a look at this answer and this answer.Use FastAPI's (Starlette's)
run_in_threadpool()
function from theconcurrency
module—as @tiangolo suggested here—which "will run the function in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked" (see here). As described by @tiangolo here, "run_in_threadpool
is anawait
able function; the first parameter is a normal function, the following parameters are passed to that function directly. It supports both sequence arguments and keyword arguments".Alternatively, use
asyncio
'sloop.run_in_executor()
—after obtaining the runningevent loop
usingasyncio.get_running_loop()
—to run the task, which, in this case, you canawait
for it to complete and return the result(s), before moving on to the next line of code. PassingNone
to the executor argument, the default executor will be used; which is aThreadPoolExecutor
:or, if you would like to pass keyword arguments instead, you could use a
lambda
expression (e.g.,lambda: cpu_bound_task(some_arg=contents)
), or, preferably,functools.partial()
, which is specifically recommended in the documentation forloop.run_in_executor()
:In Python 3.9+, you could also use
asyncio.to_thread()
to asynchronously run a synchronous function in a separate thread—which, essentially, usesawait loop.run_in_executor(None, func_call)
under the hood, as can been seen in the implementation ofasyncio.to_thread()
. Theto_thread()
function takes the name of a blocking function to execute, as well as any arguments (*args
and/or**kwargs
) to the function, and then returns a coroutine that can beawait
ed. Example:Note that as explained in this answer, passing
None
to theexecutor
argument does not create a newThreadPoolExecutor
every time you callawait loop.run_in_executor(None, ...)
, but instead re-uses the default executor with the default number of worker threads (i.e.,min(32, os.cpu_count() + 4)
). Thus, depending on the requirements of your application, that number might be quite low. In that case, you should rather use a customThreadPoolExecutor
. For instance:I would strongly recommend having a look at the linked answer above to learn about the difference between using
run_in_threadpool()
andrun_in_executor()
, as well as how to create a re-usable customThreadPoolExecutor
at the application startup, and adjust the number of maximum worker threads as needed.ThreadPoolExecutor
will successfully prevent theevent loop
from being blocked, but won't give you the performance improvement you would expect from running code in parallel; especially, when one needs to performCPU-bound
tasks, such as the ones described here (e.g., audio or image processing, machine learning, and so on). It is thus preferable to run CPU-bound tasks in a separate process—usingProcessPoolExecutor
, as shown below—which, again, you can integrate withasyncio
, in order toawait
it to finish its work and return the result(s). As described here, it is important to protect the entry point of the program to avoid recursive spawning of subprocesses, etc. Basically, your code must be underif __name__ == '__main__'
.Again, I'd suggest having a look at the linked answer earlier on how to create a re-usable
ProcessPoolExecutor
at the application startup. You might find this answer helpful as well.Use more workers to take advantage of multi-core CPUs, in order to run multiple processes in parallel and be able to serve more requests. For example,
uvicorn main:app --workers 4
(if you are using Gunicorn as a process manager with Uvicorn workers, please have a look at this answer). When using 1 worker, only one process is run. When using multiple workers, this will spawn multiple processes (all single threaded). Each process has a separate Global Interpreter Lock (GIL), as well as its ownevent loop
, which runs in the main thread of each process and executes all tasks in its thread. That means, there is only one thread that can take a lock on the interpreter of each process; unless, of course, you employ additional threads, either outside or inside theevent loop
, e.g., when using aThreadPoolExecutor
withloop.run_in_executor
, or defining endpoints/background tasks/StreamingResponse
's generators with normaldef
instead ofasync def
, as well as when callingUploadFile
's methods (see the first two paragraphs of this answer for more details).Note: Each worker "has its own things, variables and memory". This means that
global
variables/objects, etc., won't be shared across the processes/workers. In this case, you should consider using a database storage, or Key-Value stores (Caches), as described here and here. Additionally, note that "if you are consuming a large amount of memory in your code, each process will consume an equivalent amount of memory".If you need to perform heavy background computation and you don't necessarily need it to be run by the same process (for example, you don't need to share memory, variables, etc), you might benefit from using other bigger tools like Celery, as described in FastAPI's documentation.