I'm taking an intro course to UNIX and have a homework question that follows:
How many files in the previous question are text files? A text file is any file containing human-readable content. (TRICK QUESTION. Run the file command on a file to see whether the file is a text file or a binary data file! If you simply count the number of files with the
.txt
extension you will get no points for this question.)
The previous question simply asked how many regular files there were, which was easy to figure out by doing find . -type f | wc -l
.
I'm just having trouble determining what "human readable content" is, since I'm assuming it means anything besides binary/assembly, but I thought that's what -type f
displays. Maybe that's what the professor meant by saying "trick question"?
This question has a follow up later that also asks "What text files contain the string "csc" in any mix of upper and lower case?". Obviously "text" is referring to more than just .txt
files, but I need to figure out the first question to determine this!
Quotes added for clarity:
The
file
command will inspect files and tell you what kind of file they appear to be. The word "text" will (almost) always be in the description for text files.For example:
So the first part is asking you to run the
file
command and parse its output.find -type f
finds files. It filters out other filesystem objects like directories, symlinks, and sockets. It will match any type of file, though: binary files, text files, anything.It sounds like he's just saying don't do
find -name '*.txt'
or some such command to find text files. Don't assume a particular file extension. File extensions have much less meaning in UNIX than they do in Windows. Lots of files don't even have file extensions!How about a multi-part answer? I'll give the straightforward solution in #1, which is probably what your professor is looking for. And if you are interested I'll explain its shortcomings and how you can improve upon it.
One way is to use
xargs
, if you've learned about that.xargs
runs another command, using the data from stdin as that command's arguments.That works. Sort of. It'd be good enough for a homework assignment. But not good enough for a real world script.
Notice how it broke on the file
VMWare (copy).desktop
because it has a space in it. This is due toxargs
's default behavior of splitting the arguments on whitespace. We can fix that by usingxargs -0
to split command arguments on NUL characters instead of whitespace. File names can't contain NUL characters, so this will be able to handle anything.This is good enough for a production script, and is something you'll encounter a lot. But I personally prefer an alternative syntax which doesn't require a pipe, and so is slightly more efficient.
To understand that,
-exec
callsfile
repeatedly, replacing{}
with each file name it finds. The semi-colon\;
marks the end of thefile
command.