Why is my `find` command giving me errors relating to ignored directories?

105 Views Asked by At

I have this find command:

find . -type f  -not -path '**/.git/**' -not -path '**/node_modules/**'  | xargs sed -i '' s/typescript-library-skeleton/xxx/g;

for some reason it's giving me these warnings/errors:

find: ./.git/objects/3c: No such file or directory
find: ./.git/objects/3f: No such file or directory
find: ./.git/objects/41: No such file or directory

I even tried using:

-not -path '**/.git/objects/**'

and got the same thing. Anybody know why the find is searching in the .git directory? Seems weird.

2

There are 2 best solutions below

3
On BEST ANSWER

why is the find searching in the .git directory?

GNU find is clever and supports several optimizations over a naive implementation:

  • It can flip the order of -size +512b -name '*.txt' and check the name first, because querying the size will require a second syscall.
  • It can count the hard links of a directory to determine the number of subdirectories, and when it's seen all it no longers needs to check them for -type d or for recursing.
  • It can even rewrite (-B -or -C) -and -A so that if the checks are equally costly and free of side effects, the -A will be evaluated first, hoping to reject the file after 1 test instead of 2.

However, it is not yet clever enough to realize that -not -path '*/.git/*' means that if you find a directory .git then you don't even need to recurse into it because all files inside will fail to match.

Instead, it dutifully recurses, finds each file and matches it against the pattern as if it was a black box.

To explicitly tell it to skip a directory entirely, you can instead use -prune. See How to exclude a directory in find . command

4
On

Both more efficient and more correct would be to avoid the default -print action, change -not -path ... to -prune, and ensure that xargs is only used with NUL-delimited input:

find . -name .git -prune -o \
       -name node_modules -prune -o \
       -type f -print0 | xargs -0 sed -i '' s/typescript-library-skeleton/xxx/g '{}' +

Note the following points:

  • We use -prune to tell find to not even recurse down the undesired directories, rather than -not -path ... to tell it to discard names in those directories after they were found.
  • We put the -prunes before the -type f, so we're able to match directories for pruning.
  • We have an explicit action, not depending on the default -print. This is important because the default -print effectively has a set of parenthesis: find ... behaves like find '(' ... ')' -print, not like find ... -print, no if explicit action is given.
  • We use xargs only with the -0 argument enabling NUL-delimited input, and the -print0 action on the find side to generate a NUL-delimited list of names. NUL is the only character which cannot be present in an arbitrary file path (yes, newlines can be present) -- and thus the only character which is safe to use to separate paths. (If the -0 extension to xargs and the -print0 extension to find are not guaranteed to be available, use -exec sed -i '' ... {} + instead).