I am new to scripting and trying to read the .gz file and copy the lines if it contains "Alas!" in its file. myfiles/all*/input.gz
. In the mentioned path it should search for all the directories that starts with (all). for an input.gz file. In input.gz file it should search for a string "Alas!" and print the lines in a text file. I am sure how to do this linux using zgrep
command
zgrep 'Alas!' myfiles/all*/input.gz > file1.txt
. I lost somewhere while trying to write a script for this.
How to seach for a string in .gz file?
2.3k Views Asked by perkins royal At
2
There are 2 best solutions below
7

The .gz
file is compressed, so you cannot search for contents by opening it directly. You will need to uncompress it before searching. Python provides gzip.open
to open and decompress gzip-compressed files.
import gzip
files = glob.glob('myfiles/all*/input.gz')
for file in files:
with gzip.open(file, 'rt') as f, open('file1.txt', 'w') as o:
for line in f:
if 'Alas!' in line: # Changed this
print(line, file=o)
You also need to change if 'Alas!'
to if 'Alas!' in line
. The former always evaluates to True
, so every line will be added to the other file. The latter will add a line to the other file only if Alas!
is found in the line.
For what it's worth, zgrep
works in a similar way. It uncompresses the file and then pipes that to grep
(see https://stackoverflow.com/a/45175234/5666087).
The statement
merely checks if the string value
'Alas!'
is "truthy" (it is, by definition); you want to check if the variableline
contains this substring;Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).
A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)
A third problem is that the input line already contains a newline, but
print()
will add another. Either strip the newline before printing, or tellprint
not to supply another (or switch towrite
which doesn't add one).Demo: https://ideone.com/rTXBSS