I'm trying to write a bash script that looks at a directory full of files and categorises them as either plaintext or binary. A file is plaintext if it ONLY contains plaintext characters, otherwise it is binary. So far I have tried the following permutations of grep:
#!/bin/bash
FILES=`ls`
for i in $FILES
do
########GREP SYNTAX###########
if grep -qv -e[:cntrl:] $i
########/GREP SYNTAX##########
then
mv $i $i-plaintext.txt
else
mv $i $i-binary.txt
fi
done
In the grep syntax line, I have also tried the same without the -v flag and swapping the branches of the if statements, as well as both combinations of the same with [:alnum:] and [:print:]. All six of these variations produce some files labelled binary wich consist solely of plantext and some files labelled plaintext which contain at least one non-printable character.
I need to find a way to identify files that only contain printable characters i.e. A-Z, a-z, 0-9, punctuation, spaces and new lines. All files containing any character that is not in this set shoudl be classified as binary.
I've been bashing my head against a wall trying to sort this for half a day. Help! Thanks in advance, Rik
First you can/should do
instead of putting the output of
lsin a variable. The chief reason for doing this is to be able to handle filenames that include spaces.Second, you need to enclose the character class in a set of brackets or it's going to look at those characters as literals. And I would enclose them in a set of single quotes to protect against the shell interpreting them. Don't use
-vand negate theprintclass and see if that works for you.And as shown in that line, always quote variables when they contain filenames.
To keep
grepfrom complaining about binary files, use-a.The variable
iis often used for an integer or an index. Useforfile.Finally: