my regex to find opening braces at the beginning of a block is not working

114 Views Asked by At

I just started learning perl about a week ago. But i have a basic knowledge of regex features like back-references, look arounds etc. So i wrote a small regex to match strings in an array (this array represents every line of a file) that has only '{' as a printable character.

my regex goes like:

for my $f_line (@file_lines) {
    my @opening_brace;
    if ($f_line =~ /(^[[:blank:]]*(?={))({[[:blank:]]*$)/) {
      @opening_brace = $2;
      print "opening brace : @opening_brace \n"; 
    }
  }

However, my regex couldn't get me into the if block even though it worked fine with grep when i tested it against the target file.

What am i doing wrong?

i tried:

echo "{       " | grep -P '(^[[:blank:]]*(?={))({[[:blank:]]*$)'

and got:

{

2

There are 2 best solutions below

5
Hazel Daniel On

Oh I just found a fix. It appears my file 'lines' are not really lines. The problem lies in the subtle difference in logical lines of a string and literal anchors denoting the start and end of a file line which were created when I pushed the lines into an array as strings. Thanks for the help tho

1
zdim On

The regex, copied here verbatim, works

echo "{   " | perl -wnlE'say $1 if /^[[:blank:]]*(?={)({[[:blank:]]*$)/'

Prints a line: {

But there is no $2, used in the question, as the shown regex captures only once. It seems that you expect the (?=...) of the lookahead to also capture: it doesn't. We need extra parenthesis for that, (?=({)). So either add that to your regex, or keep the regex as it is and use $1 in the code. (Unless the data itself is in fact different than what the question implies.)

Then, I don't see why use a lookahead and then an actual consuming match for that very pattern. (An exercise?)


It came up in comments that input may contain a newline, like {\n. The regex from the question, used above, still works.

One way to readily see that is to remove the -l switch (so use -wnE), which chomps a newline. Then the regex is applied to a string ending with a newline, which echo adds -- and we still get a match and capture. (The POSIX Character class of [[:blank:]] does not match a linefeed.)


A general note. That $2 in the question is assigned to an array, as @opening_brace = $2. That can be done and after that the array has that one element. However, it is very misleading, and assigning to an array overwrites what may have been in it.

We add to array by push @arrayname, LIST, so in this case push @opening_brace, $2; (but see the discussion above regarding that $2). Or correct that @opening_brace to a scalar $opening_brace if the character @ for an array is there by a mistake.