How to seach for a string in .gz file?

2.3k Views Asked by At

I am new to scripting and trying to read the .gz file and copy the lines if it contains "Alas!" in its file. myfiles/all*/input.gz. In the mentioned path it should search for all the directories that starts with (all). for an input.gz file. In input.gz file it should search for a string "Alas!" and print the lines in a text file. I am sure how to do this linux using zgrep command zgrep 'Alas!' myfiles/all*/input.gz > file1.txt. I lost somewhere while trying to write a script for this.

2

There are 2 best solutions below

2
On BEST ANSWER

The statement

    if 'Alas!':

merely checks if the string value 'Alas!' is "truthy" (it is, by definition); you want to check if the variable line contains this substring;

    if 'Alas!' in line:

Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).

A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)

A third problem is that the input line already contains a newline, but print() will add another. Either strip the newline before printing, or tell print not to supply another (or switch to write which doesn't add one).

import gzip
import glob

with open('file1.txt', 'w') as o:
    for file in glob.glob('myfiles/all*/input.gz'):
        with gzip.open(file, 'rt') as f:
            for line in f:
                if 'Alas!' in line:
                    print(line, file=o, end='')

Demo: https://ideone.com/rTXBSS

7
On

The .gz file is compressed, so you cannot search for contents by opening it directly. You will need to uncompress it before searching. Python provides gzip.open to open and decompress gzip-compressed files.

import gzip

files = glob.glob('myfiles/all*/input.gz')
for file in files:
    with gzip.open(file, 'rt') as f, open('file1.txt', 'w') as o:
        for line in f:
            if 'Alas!' in line: # Changed this
                print(line, file=o)

You also need to change if 'Alas!' to if 'Alas!' in line. The former always evaluates to True, so every line will be added to the other file. The latter will add a line to the other file only if Alas! is found in the line.

For what it's worth, zgrep works in a similar way. It uncompresses the file and then pipes that to grep (see https://stackoverflow.com/a/45175234/5666087).