Errors using python magic when filtering large numbers of files based on MIME type

1.1k Views Asked by At

I have a set of files in a directory and I'm making use of python-magic library to filter out files that are of type "text/plain" and remove all the non 'text/plain' files. Below is the code I'm using

import os
import magic
def ftype(path):
    fpath = path
    mime = magic.Magic(mime=True)
    for root, dirs, fnames in os.walk(path):
        for fname in fnames:
                mi = mime.from_file(fpath+'\\'+fname)
                if not mi.endswith('plain'):
                    os.remove(fpath + '\\' + fname)
                    print(fname)
                else:
                    pass

ftype('filepath')

I'm able to run the script successfully on a small set of files. However when I ran the script on a directory that had about 40000 files I get the below error.

Traceback (most recent call last):
  File "C:\Users\dmg\AppData\Local\Programs\Python\Python37\lib\site-packages\magic\magic.py", line 91, in from_file
    return self._handle509Bug(e)
  File "C:\Users\dmg\AppData\Local\Programs\Python\Python37\lib\site-packages\magic\magic.py", line 100, in _handle509Bug
    raise e
  File "C:\Users\dmg\AppData\Local\Programs\Python\Python37\lib\site-packages\magic\magic.py", line 89, in from_file
    return maybe_decode(magic_file(self.cookie, filename))
  File "C:\Users\dmg\AppData\Local\Programs\Python\Python37\lib\site-packages\magic\magic.py", line 255, in magic_file
    return _magic_file(cookie, coerce_filename(filename))
  File "C:\Users\dmg\AppData\Local\Programs\Python\Python37\lib\site-packages\magic\magic.py", line 196, in errorcheck_null
    raise MagicException(err)
magic.magic.MagicException: b"line I64u: regex error 14 for `^[[:space:]]*class[[:space:]]+[[:digit:][:alpha:]:_]+[[:space:]]*\\{(.*[\n]*)*\\}(;)?$', (failed to get memory)"

I'm not sure what is the issue. Can someone help me with this or if there are any alternative approaches to do the above stated operation.

Update : Issue still exists after trying out some methods stated in the below comments.

0

There are 0 best solutions below