Trying to understand this perl regex bracketed character class?

320 Views Asked by At

Below is a script that I was playing with. With the script below it will print a

$tmp = "cd abc/test/.";
if ( $tmp =~ /cd ([\w\/\.])/ ) {
   print $1."\n";
}

BUT if I change it to:

$tmp = "cd abc/test/.";
if ( $tmp =~ /cd ([\w\/\.]+)/ ) {
   print $1."\n";
}

then it prints: cd abc/test/.

From my understanding the + matches one or more of the matching sequence, correct me if i am wrong please. But why in the first case it only matches a? I thought it should match nothing!!

Thank you.

2

There are 2 best solutions below

0
On BEST ANSWER

You are correct. In the first case you match a single character from that character class, while in the second you match at least one, with as many as possible after the first one.

First one :

"
cd\            # Match the characters “cd ” literally
(              # Match the regular expression below and capture its match into backreference number 1
   [\w\/\.]       # Match a single character present in the list below
                     # A word character (letters, digits, etc.)
                     # A / character
                     # A . character
)
"

Second one :

"
cd\            # Match the characters “cd ” literally
(              # Match the regular expression below and capture its match into backreference number 1
   [\w\/\.]       # Match a single character present in the list below
                     # A word character (letters, digits, etc.)
                     # A / character
                     # A . character
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"
0
On

In regexes, characters in brackets only count for a match of one character within the given bracket. In other words, [\w\/\.] matches exactly one of the following characters:

  1. An alphanumeric character or "_" (the \w).
  2. A forward slash (the \/--notice that the forward slash needs to be escaped, since it is used as the default marker for the beginning and end of a regex)
  3. A period (the \.--again, escaped since . denotes any character except the newline character).

Because /cd ([\w\/\.])./ only captures one character into $1, it grabs the first character, which in this case is "a".

You are correct in that the + allows for a match of one or more such characters. Since regexes match greedily by default, you should get all of "abc/test/." for $1 in the second match.

If you haven't already done so, you might want to peruse perldoc perlretut.