I have a very large file stored as a .txt
, which has a size of 13 GB. In the file, there are lines marked as [Invalid]
in absolutely random places. Considering that the file is that huge, I don't think it'll be good to run the script on a limited testing machine with a slow old HDD.
The file I used for this looks similar to the sample provided below. I also tried playing around with truncate()
, but that would only lead to removal of the last lines or getting the file cleared entirely (luckily I had a backup).
Vf1Ga0Qie6cxuc8o4cZK
XmQ71QRzm42Bju5DEGVn
[Invalid] diBWMYL67YfvawddJF3k
rjfUecVHkym7N0d5rJ4v
Perhaps Python has a more efficient way of removing specific lines? I tried googling the answer but I couldn't find anything like this, only little scripts that rewrite the whole file. For example:
input_file = "badfile.txt"
with open(input_file, "r") as file:
lines = file.readlines()
# In my opinion, also can be optimized for a machine with limited RAM
# by reading lines one-by-one
lines = [line for line in lines if "[Invalid]" not in line]
output_file = "badfile.txt"
with open(output_file, "w") as file:
file.writelines(lines)
You can give a try to
fileinput()
, find the documentation here: Thisinplace=True
reads your file and redirect theprint()
in the source file, be careful!