fastapi throws 400 bad request when I upload a large file

7.5k Views Asked by At

I provisioned and configured a Fedora 34 vm on VirtualBox with 2048 MB RAM to serve this FastAPI application on localhost:7070. The full application source code and dependency code and instructions are here. Below is the smallest reproducible example I could make.

main.py

import os, pathlib

import fastapi as fast
import aiofiles

        
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
RESULTS_DIR = pathlib.Path('/'.join((ROOT_DIR, 'results')))

    
app = fast.FastAPI()

    
@app.post('/api')
async def upload(
    request: fast.Request, 
    file: fast.UploadFile = fast.File(...),
    filedir: str = ''):
        
    dest = RESULTS_DIR.joinpath(filedir, file.filename)
    dest.parent.mkdir(parents=True, exist_ok=True)

    async with aiofiles.open(dest, 'wb') as buffer:
        await file.seek(0)
        contents = await file.read()
        await buffer.write(contents)

    return f'localhost:7070/{dest.parent.name}/{dest.name}'

start.sh the server application

#! /bin/bash
uvicorn --host "0.0.0.0" --log-level debug --port 7070 main:app

client.py

import httpx
from pathlib import Path
import asyncio

async def async_post_file_req(url: str, filepath: Path):    
    async with httpx.AsyncClient(
        timeout=httpx.Timeout(write=None, read=None, connect=None, pool=None)) as client:
        r = await client.post(
            url, 
            files={
                'file': (filepath.name, filepath.open('rb'), 'application/octet-stream')
            }
        )

if __name__ == '__main__':
    url = 'http://localhost:7070'
    asyncio.run(
        async_post_file_req(
            f'{url}/api',            
            Path('~/1500M.txt')
    ))

create a 1500 MB file

truncate -s 1500M 1500M.txt

When uploading a 1500 MB file, the current implementation of upload appears to read the whole file into memory, and then the server responds with {status: 400, reason: 'Bad Request', details: 'There was an error parsing the body.'}, and the file is not written to disk. When uploading an 825 MB file, the server responds with 200, and the file is written to disk. I don't understand why there is an error in parsing the larger file.

What's going on?

How do I upload files that are larger than the machine's available memory?

Do I have to stream the body?

1

There are 1 best solutions below

0
On

Digging into the source code, I found that FastAPI is throws the HTTP exception with status code 400 and detail There was an error in parsing body exactly once in the source code, when it is trying to figure out if the request form or body needs to be read. The FastAPI Request is basically the Starlette Request, so I reimplemented the FastAPI server application as a Starlette application hoping it would bypass this exception handler and give me more information about this issue.

main.py

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route
async def homepage(request):
    return JSONResponse({'hello': 'world'})
async def upload(request):
  form = await request.form()
  print(type(form['upload_file']))
  filename = form['upload_file'].filename or 'not found'
  contents = await form['upload_file'].read()
  b = len(contents) or -1
  return JSONResponse({
    'filename': filename,
    'bytes': b
  })
app = Starlette(debug=True, routes=[
    Route('/', homepage),
    Route('/api', upload, methods=['POST'])
])

Pipfile

[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
starlette = "*"
uvicorn = "*"
uvloop = "*"
httpx = "*"
watchgod = "*"
python-multipart = "*"

[dev-packages]

[requires]
python_version = "3.9"

On posting a file of size 989 MiB or larger, the Starlette application throws OS error 28, no space left on device. A file of size 988 MiB or less, caused no error.

INFO:     10.0.2.2:46996 - "POST /api HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/home/vagrant/star-file-server/./main.py", line 11, in upload
    form = await request.form()
  File "/usr/local/lib/python3.9/site-packages/starlette/requests.py", line 240, in form
    self._form = await multipart_parser.parse()
  File "/usr/local/lib/python3.9/site-packages/starlette/formparsers.py", line 231, in parse
    await file.write(message_bytes)
  File "/usr/local/lib/python3.9/site-packages/starlette/datastructures.py", line 445, in write
    await run_in_threadpool(self.file.write, data)
  File "/usr/local/lib/python3.9/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib64/python3.9/tempfile.py", line 755, in write
    rv = file.write(s)
OSError: [Errno 28] No space left on device

Starlette's UploadFile data structure uses a SpooledTemporaryFile. This object writes to your os's temporary directory. My temporary directory is /tmp because I'm on Fedora 34, and I have not created any environment variables to tell python to use anything else as a temporary directory.

[vagrant@fedora star-file-server]$ python
Python 3.9.5 (default, May 14 2021, 00:00:00) 
[GCC 11.1.1 20210428 (Red Hat 11.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tempfile
>>> tempfile.gettempdir()
'/tmp'
[vagrant@fedora star-file-server]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        974M     0  974M   0% /dev
tmpfs           989M  168K  989M   1% /dev/shm
tmpfs           396M  5.6M  390M   2% /run
/dev/sda1        40G  1.6G   36G   5% /
tmpfs           989M     0  989M   0% /tmp
tmpfs           198M   84K  198M   1% /run/user/1000

Starlette sets max_size for the SpooledTemporaryDirectory to 1 MiB. From the Python tempfile documentation, I think that means only 1 MiB can be read into memory at a time from the temporary file while it is being used. Although it is of by 1 MiB, 989 MiB appears to be the correct hard boundary on the UploadFile size because SpooledTemporaryDirectory is bound by the storage available to the system's temporary directory.

If I still want to use UploadFile I can create an environment variable to point to a device that is known to always have enough space available, even for the largest uploads.

export TMPDIR=/huge_storage_device

The approach I prefer uses the request's stream, to avoid having to write the file twice, first to a local temporary directory, and second to a local permanent directory.

import os, pathlib

import fastapi as fast
import aiofiles

app = fast.FastAPI()


@app.post('/stream')
async def stream(
    request: fast.Request,
    filename: str,
    filedir: str = ''
):

    dest = RESULTS_DIR.joinpath(filedir, filename)
    dest.parent.mkdir(parents=True, exist_ok=True)        

    async with aiofiles.open(dest, 'wb') as buffer:       
        async for chunk in request.stream():
            await buffer.write(chunk)

    return {
        'loc': f'localhost:7070/{dest.parent.name}/{dest.name}'
    }   

Using this approach, when I uploaded files (5M, 450M, 988M each with two repeated measures) to the server running on a Fedora vm with 2048 MiB memory, the server never used up too much memory, never crashed, and the average latency reduction was 40% (i.e. the latency of posting to /stream was about 60% of latency posting to /api).