Is there a way to remove a single line in Python without wasting resources?

171 Views Asked by At

I have a very large file stored as a .txt, which has a size of 13 GB. In the file, there are lines marked as [Invalid] in absolutely random places. Considering that the file is that huge, I don't think it'll be good to run the script on a limited testing machine with a slow old HDD.

The file I used for this looks similar to the sample provided below. I also tried playing around with truncate(), but that would only lead to removal of the last lines or getting the file cleared entirely (luckily I had a backup).

Vf1Ga0Qie6cxuc8o4cZK
XmQ71QRzm42Bju5DEGVn
[Invalid] diBWMYL67YfvawddJF3k
rjfUecVHkym7N0d5rJ4v

Perhaps Python has a more efficient way of removing specific lines? I tried googling the answer but I couldn't find anything like this, only little scripts that rewrite the whole file. For example:

input_file = "badfile.txt"

with open(input_file, "r") as file:
    lines = file.readlines()

# In my opinion, also can be optimized for a machine with limited RAM 
# by reading lines one-by-one

lines = [line for line in lines if "[Invalid]" not in line]

output_file = "badfile.txt"

with open(output_file, "w") as file:
    file.writelines(lines)
1

There are 1 best solutions below

2
On

You can give a try to fileinput(), find the documentation here: This inplace=True reads your file and redirect the print() in the source file, be careful!

import fileinput

word = "[Invalid]"
with fileinput.input(files=('badfile.txt'), encoding="utf-8", inplace=True) as f:
    for line in f:
        if word not in line:
            print(line, end='')