Find files using gnu parallel

1.3k Views Asked by At

I understand that using the following command

find . -name "*.foo" | parallel grep bar

will be executed in 2 steps :

1) do a search for all files matching "*.foo" .

2) Then on this set of files it will do a parallel search to look for index "bar" inside the files .

But is it also possible to parallellize the first step itself ?

2

There are 2 best solutions below

1
On BEST ANSWER

If you really think your disks are up to parallel finding and grepping, you could do this:

printf "%s\0" */ | parallel -0 'find {} -name "*foo" | parallel grep bar'

Running a full grep process for each file is also not very sensible. You should consider using GNU Parallel's -X option to let each grep process search multiple files.

2
On

BLUF: pipe | is used to run the command on the output of previous command.

Here the out of find is list of files, and grep can work in parallel on each file. if you reverse the order then output of grep is list of lines containing your string. and find wont work on that output.

You can do this in single command:

grep -R --include="foo*" "bar" /path/to/directory
  • -R means recursive, so it will go into subdirectories of the directory you're grepping through
  • --include="*.c" means "look for files ending in .c"
  • "bar" is the pattern you're grepping for
  • /path/to/directory is the path to the directory you want to grep through