itertools system error for large files

263 Views Asked by At

I am trying to get the even lines from my very big file(~300GB),and I am able to do it for a file with almost the same size that I am getting the error is. The code is :

import itertools
import sys, os

with open('FILE.fasta') as f:
    fd = open("FILE.txt","w")
    fd.writelines(set(itertools.islice(f, 0, None, 2)))
    fd.close()

And the error is :

   Traceback (most recent call last):
   File "new3.py", line 7, in <module>
   fd.writelines(set(itertools.islice(f, 0, None, 2)))
   SystemError: Negative size passed to PyString_FromStringAndSize

Do you indeed think it is because the file is way too big? I have checked the memory usage while the code was working, and it was never more than 50%..

I would appreciate any help!

1

There are 1 best solutions below

1
Ashalynd On

Don't make set from the underlying iterator - it's extremely expensive procedure. You should be able to give this iterator to writelines directly:

fd.writelines(itertools.islice(f, 0, None, 2))

Other small nit:

You don't need to write

import sys, os

because you have already imported sys on the line above. Either remove the line above or write import os.