Using ack/grep with globstar option for big directory tree (performance issue?)

310 Views Asked by At

I'd like to use globstar option '**' of bash to search files with a certain directories, as is customary is used with grep. However, the following command just started and did nothing

ack PATTERN ~/projects/**/trunk/

How I can search in files with ack using similar search place?

EDIT: I tried to perform same line on model directory with few files. And it works fine. So, it looks like that operator ** demands too much resources (it has a recursion, I guess) and when I execute the command, it just iterate through file tree. By the way, the following command slows down the computer and it was a reason I tried ack command.

grep -r PATTERN ~/projects/**/trunk/

So, I'd like to know is there some workaround to gain my goal (not necessarily with **)?

2

There are 2 best solutions below

5
On

This should work:

\find -X . -type d -name trunk | xargs -L 1 ack PATTERN

The -X argument, from the find manual

-X Permit find to be safely used in conjunction with xargs(1). If a file name contains any of the delimiting characters used by xargs(1), a diagnostic message is displayed on standard error, and the file is skipped. The delimiting characters include single (' '') and double ( " '') quotes, backslash (``\''), space, tab and newline characters.

EDIT: Based on the comments better solution may to be (works on BSD/GNU find/xargs AFAICT):

\find . -type d -name trunk -print0 | xargs -0 -L 1 ack PATTERN

I guess it will depend on whether your end command can take the arguments with the funny filenames

0
On

The shell expands ** -- that is if you've first set shopt -s globstar and are using Bash 4.0 or higher. In that case, you can look to see exactly what files the shell is matching to your globstar. Try these:

$ ls -d ~/projects/**/trunk/

or:

$ echo ~/projects/**/trunk/

Then, go get a cup of coffee because globstar can take quite a long time to execute. You might find that you're not even hitting a file where grep or ack returns a match. That's minutes of your life you've wasted and you'll never get back. Might as well have watched the Star Wars Holiday Special. At least with that you could make some sort of drinking game.

The problem with globstar is that the shell is forced to walk down your entire directory tree trying to match **. Then, the shell doesn't report back its results until it has found each and every file. It's slow, and very inefficient.

Here's what happened on my system:

$ time ls -d ~/projects/**/trunk
/Users/david/projects/foo/trunk           /Users/david/projects/barfoo/trunk
/Users/david/projects/bar/trunk           /Users/david/projects/foofoo/trunk
/Users/david/projects/foobar/trunk        /Users/david/projects/trunk

real    0m18.19s
user    0m0.52s
sys     0m15.83s

I stood staring at the terminal for almost 20 seconds before those six results popped up. If I did ~/**/trunk, it takes over 20 minutes without a single result being returned (I killed it after 20 minutes).

And here's the equivalent with find:

$ time find ~/projects -name trunk
/Users/david/projects/foo/trunk
/Users/david/projects/bar/trunk
/Users/david/projects/foobar/trunk
/Users/david/projects/barfoo/trunk
/Users/david/projects/foofoo/trunk
/Users/david/projects/trunk

real    0m4.09s
user    0m0.20s
sys     0m0.91s

The find was four times faster than using globstar. Plus, the results are returned as soon as they're found. You're better off using find:

$ find ~/projects -name trunk -type d -exec ack PATTERN {}\;