I provisioned and configured a Fedora 34 vm on VirtualBox with 2048 MB RAM to serve this FastAPI application on localhost:7070
. The full application source code and dependency code and instructions are here. Below is the smallest reproducible example I could make.
main.py
import os, pathlib
import fastapi as fast
import aiofiles
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
RESULTS_DIR = pathlib.Path('/'.join((ROOT_DIR, 'results')))
app = fast.FastAPI()
@app.post('/api')
async def upload(
request: fast.Request,
file: fast.UploadFile = fast.File(...),
filedir: str = ''):
dest = RESULTS_DIR.joinpath(filedir, file.filename)
dest.parent.mkdir(parents=True, exist_ok=True)
async with aiofiles.open(dest, 'wb') as buffer:
await file.seek(0)
contents = await file.read()
await buffer.write(contents)
return f'localhost:7070/{dest.parent.name}/{dest.name}'
start.sh
the server application
#! /bin/bash
uvicorn --host "0.0.0.0" --log-level debug --port 7070 main:app
client.py
import httpx
from pathlib import Path
import asyncio
async def async_post_file_req(url: str, filepath: Path):
async with httpx.AsyncClient(
timeout=httpx.Timeout(write=None, read=None, connect=None, pool=None)) as client:
r = await client.post(
url,
files={
'file': (filepath.name, filepath.open('rb'), 'application/octet-stream')
}
)
if __name__ == '__main__':
url = 'http://localhost:7070'
asyncio.run(
async_post_file_req(
f'{url}/api',
Path('~/1500M.txt')
))
create a 1500 MB file
truncate -s 1500M 1500M.txt
When uploading a 1500 MB file, the current implementation of upload
appears to read the whole file into memory, and then the server responds with {status: 400, reason: 'Bad Request', details: 'There was an error parsing the body.'}
, and the file is not written to disk. When uploading an 825 MB file, the server responds with 200, and the file is written to disk. I don't understand why there is an error in parsing the larger file.
What's going on?
How do I upload files that are larger than the machine's available memory?
Do I have to stream the body?
Digging into the source code, I found that FastAPI is throws the HTTP exception with status code 400 and detail
There was an error in parsing body
exactly once in the source code, when it is trying to figure out if the request form or body needs to be read. The FastAPI Request is basically the Starlette Request, so I reimplemented the FastAPI server application as a Starlette application hoping it would bypass this exception handler and give me more information about this issue.main.py
Pipfile
On posting a file of size 989 MiB or larger, the Starlette application throws OS error 28, no space left on device. A file of size 988 MiB or less, caused no error.
Starlette's UploadFile data structure uses a SpooledTemporaryFile. This object writes to your os's temporary directory. My temporary directory is
/tmp
because I'm on Fedora 34, and I have not created any environment variables to tell python to use anything else as a temporary directory.Starlette sets
max_size
for theSpooledTemporaryDirectory
to 1 MiB. From the Python tempfile documentation, I think that means only 1 MiB can be read into memory at a time from the temporary file while it is being used. Although it is of by 1 MiB, 989 MiB appears to be the correct hard boundary on theUploadFile
size becauseSpooledTemporaryDirectory
is bound by the storage available to the system's temporary directory.If I still want to use
UploadFile
I can create an environment variable to point to a device that is known to always have enough space available, even for the largest uploads.The approach I prefer uses the request's
stream
, to avoid having to write the file twice, first to a local temporary directory, and second to a local permanent directory.Using this approach, when I uploaded files (5M, 450M, 988M each with two repeated measures) to the server running on a Fedora vm with 2048 MiB memory, the server never used up too much memory, never crashed, and the average latency reduction was 40% (i.e. the latency of posting to
/stream
was about 60% of latency posting to/api
).