I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:
#!/bin/bash
FILES=`ls`
for i in $FILES
do
########GREP SYNTAX###########
if grep -qv -e[:cntrl:] $i
########/GREP SYNTAX##########
then
mv $i $i-plaintext.txt
else
mv $i $i-binary.txt
fi
done
In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.
I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.
I've been bashing my head against a wall trying to sort this for half a day. Help! Thanks in advance, Rik
First you can/should do
instead of putting the output of
ls
in a variable. The chief reason for doing this is to be able to handle filenames that include spaces.Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use
-v
and negate theprint
class and see if that works for you.And as shown in that line, always quote variables when they contain filenames.
To keep
grep
from complaining about binary files, use-a
.The variable
i
is often used for an integer or an index. Usef
orfile
.Finally: