Why doesn't this regex catch the periods correctly?

324 Views Asked by At

I'm fiddling around trying to learn more about shell scripting. So, I have some files with email in them that spamassassin writes to a directory, and I thought I would try to do some regex matching on them. So, I select files that require different matches and then try to sort through them.

I wrote this script:

#!/usr/local/bin/bash
#
regex='(\.)?'
files="/var/spool/spam/testing/out.*"
for i in $files; do
domain=`cat $i | grep -i "Message-ID: <" | cut -d'@' -f2 | cut -d'>' -f1 | cut -d' ' -f1`
echo "Domain is $domain"
echo "We're starting the if loop"
if [ -z "$domain" ];
then
echo "Domain is empty"
echo $i
#rm $i
elif ! [[ "$domain" =~ $regex ]];
then
echo "There are no periods in the domainname $domain"
elif [[ $domain =~ $regex ]];
then
echo "There are periods in the domainname $domain"
fi
done

What I'm trying to accomplish is separate the domain part of Message-ID: and then determine what that domain is. Some Message-IDs have no domain at all. Some have bogus domains. Some have domains like this: yahoo.co.uk.

Every message has two Message-ID: entries, so the domain names end up appearing twice.

When I run this script on two files, this is the result I get:

# bash /usr/local/bin/rm-bounces.sh 
Domain is xbfoqrka
xbfoqrka
We're starting the if loop
There are periods in the domainname xbfoqrka
xbfoqrka
Domain is SKY-20150201SFT.com
SKY-20150201SFT.com
We're starting the if loop
There are periods in the domainname SKY-20150201SFT.com
SKY-20150201SFT.com

What I don't understand is why xbfoqrka matches the regex that supposed to find periods in the domain name but doesn't match the regex that looks for NO periods in the domain name. I'm escaping the period, so it should be an exact match, and there's no period in xbfoqrka xbfoqrka.

1

There are 1 best solutions below

6
On

The ? symbol means zero or one. So the regexp is looking for at least zero or one . in the text. Since there's no . in xbfoqrka then the regex finds a match (for zero).

Note that the regex will return true for any number of . - zero, one, three, 100 etc. That's because a string with 100 dots have at least zero or one dots.