Serializing, compressing and writing large object to file in one go takes too much memory

196 Views Asked by At

I have a list of very large objects objects, that I want to compress and save to the hard drive.

My current approach is

import brotli
import dill
# serialize list of objects
objects_serialized = dill.dumps(objects, pickle.HIGHEST_PROTOCOL)
# compress serialized string
objects_serialized_compressed = brotli.compress(data=objects_serialized, quality=1)
# write compressed string to file
output.write(objects_serialized_compressed)

However, if objects is very large, this leads to a memory error, since -- for some time -- I simultaneously carry objects, objects_serialized, objects_serialized_compressed around in their entirety.

Is there a way to do this chunk-wise? Presumably the first step -- serializing the objects -- has to done in one go, but perhaps the compression and writing to file can be done chunk-wise?

1

There are 1 best solutions below

1
Memristor On BEST ANSWER

I'd try this, after many attemps:

import brotli
import dill
import io
import pickle

# The following serialized object is 30kb
objects = ["234r234r234", "3f234f2343f3", "234ff234f234f234rf32"]*5000
objects_serialized = dill.dumps(objects, pickle.HIGHEST_PROTOCOL)

# Set up a buffer for reading chunks of serialized data
chunk_size = 1024 * 1024
buffer = io.BytesIO(objects_serialized)

# Create compressor for repeated use
compressor = brotli.Compressor(quality=1)
with open('output.brotli', 'wb') as output:
    # Read chunks from the buffer and compress them
    while True:
        chunk = buffer.read(chunk_size)
        if not chunk:
            break
        compressed_chunk = compressor.process(chunk)
        output.write(compressed_chunk)

    # Flush the remaining compressed data
    compressed_remainder = compressor.finish()
    # 4kb in my computer
    # I decompressed, de-serialized, and retrieved the original object
    output.write(compressed_remainder)

This requires brotli 1.0.9, as provided by pip -- it does not work with brotlipy, as provided by anaconda.