Iterate over specific files in a directory using Bash find

1.5k Views Asked by At

Shellcheck doesn't like my for over find loop in Bash.

for f in $(find $src -maxdepth 1 -name '*.md'); do wc -w < "$f" >> $path/tmp.txt; done

It suggests instead:

1  while IFS= read -r -d '' file
2  do
3      let count++
4      echo "Playing file no. $count"
5      play "$file"
6  done <   <(find mydir -mtime -7 -name '*.mp3' -print0)
7  echo "Played $count files"

I understand most of it, but some things are still unclear.

In line one: What is '' file?

In line six: What does the empty space do in < < (find). Are the < redirects, as usual? If they are, what does it mean to redirect into do block?

Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?

2

There are 2 best solutions below

1
On BEST ANSWER

In line one: What is '' file?

According to help read, that '' is an argument to the -d parameter:

-d delim    continue until the first character of 
            DELIM is read, rather than newline

In line six: What does the empty space do in < < (find).

There are two separate operators there. There is <, the standard I/O redirection operator, followed by a <(...) construct, which is a bash-specific construct that performs process substitution:

Process Substitution

    Process  substitution  is  supported on systems that
    support named pipes (FIFOs) or the /dev/fd method of naming
    open files.  It takes the form of <(list) or >(list).  The
    process list is run with its  input  or output  connected
    to  a FIFO or some file in /dev/fd...

So this is is sending the output of the find command into the do loop.

Are the < redirects, as usual? If they are, what does it mean to redirect into do block?

Redirect into a loop means that any command inside that loop that reads from stdin will read from the redirected input source. As a side effect, everything inside that loop runs in a subshell, which has implications with respect to variable scope: variables set inside the loop won't be visible outside the loop.

Can someone help parse this out? Is this the right way to iterate over files of a certain kind in a directory?

For the record, I would typically do this by piping find to xargs, although which solution is best depends to a certain extend on what you're trying to do. The two examples in your question do completely different things, and it's not clear what you're actually trying to accomplish.

But for example:

find $src -maxdepth 1 -name '*.md' -print0 |
  xargs -0 -iDOC wc -w DOC

This would run wc on all the *.md files. The -print0 to find (and the -0 to xargs) permit this command to correctly handle filenames with embedded whitespace (e.g., This is my file.md). If you know you don't have any of those, you just do:

find $src -maxdepth 1 -name '*.md' |
  xargs -iDOC wc -w DOC
0
On

Generally, you need to use find if you want to do a recursive search through a directory tree (although with modern bash, you can set the shell option globstar, as shellcheck suggests). But in this case you've specified -maxdepth 1, so your find command is just listing files which match the pattern "$src"/*.md. That being the case, it is much simpler and more reliable to use the glob (pattern):

for f in "$src"/*.md; do
  wc -w < "$f"
done >> "$path"/tmp.txt

(I also quoted all the variable expansions, for safety, and moved the output redirection so it applies to the entire for loop, which is slightly more efficient.)

If you need to use find (because a glob won't work), then you should attempt to use the -exec option to find, which doesn't require fiddling around with other options to avoid mishandled special characters in filenames. For example, you could do this:

find "$src" -maxdepth 1 -name '*.md' -exec do wc -w {} + >> "$path"/tmp.txt

To answer your specific questions:

  1. In IFS= read -r -d '' file, the '' is the argument to the -d option. That option is used to specify the character which delimits lines to be read; by default, a newline character is used so that read reads one line at a time. The empty string is the same as specifying the NUL character, which is what find outputs at the end of each filename if you specify the -print0 option. (Unlike -exec, -print0 is not Posix standard so it is not guaranteed to work with every find implementation, but in practice it's pretty generally available.)

  2. The space between < and <(...) is to avoid creating the token <<, which would indicate a here-document. Instead, it specifies a redirection (<) from a process substitution (<(...)).