find and grep / zgrep / lzgrep progress bar

1k Views Asked by At

I would like to add a progress bar to this command line:

find . \( -iname "*.bz" -o -iname "*.zip" -o -iname "*.gz" -o -iname "*.rar" \) -print0 | while read -d '' file; do echo "$file"; lzgrep -a stringtosearch\.anything "$file"; done

The progress file should be calculated on the total of compressed size files (not on the single file).

Of course, it can be a script too.

I would also like to add other progress bars, if possible:

  1. The total number of files processed (example 3 out of 21)
  2. The percentage of progress of the single file

Can anybody help me please?

Here some example of it should look alike (example from here):

tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip > big-files.tar.gz

Multiple progress bars (example from here):

pv -cN orig < foo.tar.bz2 | bzcat | pv -cN bzcat | gzip -9 | pv -cN gzip > foo.tar.gz 

Thanks,

3

There are 3 best solutions below

2
On

This is the first time I've ever heard of pv and it's not on any machine I have access to but assuming it needs to know a total at startup and then a number on each iteration of a command, you could do something like this to get a progress bar per file processed:

IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[@]}" | pv -s "${#files[@]}" | command

The first line gives you an array of files so you can then use "${#files[@]}" to provide pv it's initial total value (looks like you use -s value for that?) and then do whatever you normally do to get progress as each file is processed.

I don't see any way to tell pv that the pipe it's reading from is NUL-terminated rather than newline-terminated so if your files can have newlines in their names then you'd have to figure out how to solve that problem.

To additionally get progress on a single file you might need something like:

IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[@]}" |
    pv -s "${#files[@]}" |
    xargs -n 1 -I {} sh -c 'pv {} | command'

I don't have pv so all of the above is untested so check the syntax, especially since I've never heard of pv :-).

0
On

Thanks to Max C., I found a solution for the main question:

find ./ -type f -iname *\.gz -o -iname *\.bz | (tot=0;while read fname; do s=$(stat -c%s "$fname"); if [ !  -z "$s" ] ; then echo "$fname"; tot=$(($tot+$s)); fi; done; echo $tot) | tac  | (read size; xargs -i{} cat "{}" | pv -s $size | lzgrep -a something -)

But this work only for gz and bz files, now I have to develop to use different tool according to extension.

I'm gonna to try the Ed solution too.

0
On

Thanks to ED and Max C., here the verision 0.2 This version work with zgrep, but not with lzgrep. :-\

#!/bin/bash
echo -n "collecting dump... "
IFS= readarray -d '' files < <(find . \( -iname "*.bz" -o -iname "*.gz" \) -print0)
echo done
echo "Calculating archives size..."
tot=0
for line in "${files[@]}"; do
    s=$(stat -c\%s "$line")
       if [ !  -z "$s" ]
       then
           tot=$(($tot+$s))
       fi
done

(for line in "${files[@]}"; do
    s=$(stat -c\%s "$line")
       if [ !  -z "$s" ]
       then
           echo "$line"
       fi
done
) | xargs -i{} sh -c 'echo Processing file: "{}" 1>&2 ; cat "{}"' | pv -s $tot | zgrep -a anything -