How to upload file to FastAPI in chunks without saving to hard drive?

53 Views Asked by At

The goal: upload a file to FastAPI in chunks and process them without saving to hard drive. I have reviewed several similar topics. For example, issue73442335, issue70520522. In issue65342833 it was said, Starlette saves to hard drive any files large than 1 MB. As I'm going to forward my chunks to a cloud, I don't want any storaging of temp files. Is that possible to keep a chunk AND payload in memory before processing them further?

Here is my code.

app.py

from fastapi import Request
from pydantic import parse_obj_as
from pydantic.dataclasses import dataclass
import json

@dataclass
class Payload:
    part: str
    total_size: int
    chunk_size: int

    def __repr__(self):
        return f"<FormData: part={self.part} total_size={self.total_size}"

@app.post('/upload')
async def upload(request: Request):
    filename = request.headers.get('filename')

    transferred_data = await request.form()
    file_chunk = transferred_data['file_chunk'].file.read()
    json_loads = json.loads(transferred_data['payload'])
    payload = parse_obj_as(Payload, json_loads)

I hesitate that these two lines

transferred_data = await request.form()
file_chunk = transferred_data['file_chunk'].file.read()

keep data in memory as it is based on Starlette and contains UploadFile inside. BTW, how can I check how FastAPI uses hard drive?

test.py

import json
import os
import secrets
import requests
from requests_toolbelt import MultipartEncoder

CHUNK_SIZE = 1024 * 1024 * 5  # 5 MB
URL = 'http://localhost:8000/upload'
FILE_PATH = 'temp.txt'

counter = 1


with open(FILE_PATH, 'rb') as f:


    while True:
        chunk = f.read(CHUNK_SIZE)
        if not chunk:
            break
        payload = {
            'part': str(counter),
            'total_size': str(os.path.getsize(FILE_PATH)),
            'chunk_size': str(CHUNK_SIZE),
        }
        m = MultipartEncoder(
            fields={
                'payload': json.dumps(payload),
                'file_chunk': ('file_chunk', chunk, 'application/octet-stream')
            }
        )
        response = requests.post(URL, headers={'Content-Type': m.content_type, 'filename': f.name}, data=m, stream=True)
        print(response.text)
        counter += 1

Another possible solution could me an adjustment of Starlette class MultiPartParser and its attribute max_file_size. Perhaps, custom middleware would help.

0

There are 0 best solutions below