Memory Error in Python when trying to convert large files to binary

195 Views Asked by At

I'm working on developing a program to generate a dictionary containing binaries of various files for a Minecraft modpack. Everything works smoothly when processing smaller files. However, when attempting to process larger files, such as the "Better End Reforked" mod which is 90MB in size, the program encounters a "Memory Error" and fails.

This is my code:

from os      import listdir
from os.path import abspath, splitext, basename, getsize, isfile, join, expandvars

def getBool(statement):
    return (True if statement else False)

def getFiles(path):
    list = []

    for file in listdir(path):
        filePath = join(path, file)
        if isfile(filePath):
            list.append((file, abspath(filePath)))
    
    return list

def getIdentity(path):
    name = basename(path)
    
    return {
        'name': name,
        'basename': splitext(name)[0], 
        'extension': splitext(name)[1],
        'realpath': abspath(expandvars(path)),
        'size': getsize(path)
    }

def fromFileToBinary(path):
    try:
        with open(path, 'rb') as file:
            binary = file.read()
            return binary.hex()
    except IOError:
        return None


output = getIdentity('./output/bin.py')['realpath']

config       = 'C:/Users/Berdy Alexei/Downloads/modpack/optional/config'
mod          = 'C:/Users/Berdy Alexei/Downloads/modpack/optional/mods'
resourcepack = 'C:/Users/Berdy Alexei/Downloads/modpack/optional/resourcepacks'
script       = 'C:/Users/Berdy Alexei/Downloads/modpack/optional/scripts'

bool = getBool(config or mod or resourcepack or script)

def _bin(path = None, bool = True):
    dictionary = {}

    if path and bool:
        files = getFiles(path)
        
        for file in files:
            name, path = file
            dictionary[name] = fromFileToBinary(path)

    return dictionary

content = {
    'default': {
        'config':       _bin('C:/Users/Berdy Alexei/Downloads/modpack/default/config', False),
        'mods':         _bin('C:/Users/Berdy Alexei/Downloads/modpack/default/mods', False),
        'resourcepack': _bin('C:/Users/Berdy Alexei/Downloads/modpack/default/resourcepacks', False),
        'script':       _bin('C:/Users/Berdy Alexei/Downloads/modpack/default/scripts', False)
    },
    'optional': (bool, {
        'config':       _bin(config),
        'mods':         _bin(mod),
        'resourcepack': _bin(resourcepack),
        'script':       _bin(script)
    })
}

with open(output, "w") as file:
    file.write('BIN = {}'.format(content))

This is the error:

Traceback (most recent call last):
  File "c:\Folders\Archivos\Proyectos\InstallerCrafter\InstallerCrafter.py", line 75, in <module>
    file.write('BIN = {}'.format(content))
MemoryError

These are the files I'm processing: enter image description here

2

There are 2 best solutions below

4
Berdy Alexei Cadaeib Fecei On BEST ANSWER

My solution:

def fromFileToBinary(path, size = 4):
    try:
        binary = ''

        with open(path, 'rb') as file:
            while True:
                chunk = file.read(1024 *  size)

                if not chunk:
                    break

                binary += chunk.hex()
                del chunk

        return binary
    except IOError:
        return None

The disadvantage is that it takes a long time to generate the binary, however it can be improved by adding that, after a certain weight of the file, the function is executed in this way or not (See the "getIdentity" function above in the code).

4
Miodek On

A memory error means, that your program is running out of memory to store stuff. There is a workaround for that, which is storing binary data in smaller chunks.

Right now, the _bin method returns a HUGE variable, which your computer is just not ready to store, or do anything with for that matter. Instead of that, you can make the function read the files by let's say 1024 bytes (the read() function has a non-required argument which is the amount of bytes to read). And then, you could store said chunks in a list, which will be returned by the function. It would look something like this:

def fromFileToBinary(path, length):
    chunks = []
    try:
        with open(path, 'rb') as file:
            for i in range(length):
                binary = file.read(1024)
                chunks.append(binary.hex())
    except IOError:
        return None
    return chunks

Good thing you already handled the possibility of an IOError.

I also don't see the need for the getBool function. If you combine two or more objects with an or operator(s), the outcome will always be the type of boolean.

One more thing: naming convention. As NoBlockhit very accurately suggested, do not use variable names that would shadow already existing objects, like bool or list. And while the usage of _ before a variable is most certainly okay, it is widely considered as a prefix for a variable in a class that isn't meant to be used outside the scope of said class.