How to get a group matched in a recursive regular expression?

292 Views Asked by At

I'm writing a simple regular expression that needs receive a pair of coordinates and/or a map name.

For example:

move 10 15 # should returns [[10, 15]]
move 10 15 map # should returns [[10, 15, 'map']]
move map # should returns [['map']]
move 10 15 mapA mapB # should returns [[10, 15, 'mapA'], ['mapB']] 
move 10 15 mapA mapB 33 44 # should returns [[10, 15, 'mapA'], ['mapB'], [33, 44]]
move 10 15 mapA 33 44 mapB # should returns [[10, 15, 'mapA'], [33, 44, 'mapB']]

Then, I wrote this regular expression:

/
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six

But how I can get the value for groups x, y and map using Perl?

I tried with some ways:

use strict;
use warnings;

my $command = 'move 10 15';

$command =~ /
  (?(DEFINE)
     (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
     (?<map>      (?<mapname>[a-zA-Z]+) )
     (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
  )
  move\s+(?&commands)
/six;

while (my ($k,$v) = each %+) { print "$k $v\n" }
print "$+{x}";
3

There are 3 best solutions below

1
On BEST ANSWER

As the question stands, you can't have that. The perlre says about this

Note that capture groups matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing groups is necessary.

but the pattern <x> can't be had with "an extra layer" of capturing afterwards since it's used only inside the grammar. You can only have the whole thing

if ($command =~ /
        move\s+ (?<match>(?&commands))
        (?(DEFINE)
            (?<coord>    (?<x>\d+)\s+(?<y>\d+) )
            (?<map>      (?<mapname>[a-zA-Z]+) )
            (?<commands> \s* (?: (?&coord) | (?&map) ) \s* (?&commands)? )
        )
    /six)
{
    say "got: $+{match}";
}

where I've moved the ?(DEFINED) block at the end of the pattern, as recommended.

Note that it wouldn't make sense either: in a recursive match, which of multiple <x> should one get? So you'd need to restructure the approach so to be able to re-capture the match you want; but I don't see how to do this if you want a subpattern buried so deeply.

For the problem as presented I'd go with writing a simple parser, never mind a catch-all regex. Or, in your approach re-process the match for its parts, hopefully much easier once you have it.

And then there are powerful tools, like Marpa::R2, Parse::RecDescent, Regexp::Grammars.

0
On

Since I cannot yet comment, Stefan Becker's solution has a flaw.

It will fail if a coordinate is 0.

Here's the fix:

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

while (<DATA>) {
    my @row;
    chomp;
    if (/^move/) {
        while (/(?:(\d+)\s+(\d+))?(?:\s+([[:alpha:]]+))?/g) {
            my @match;
            push(@match, +$1, +$2) if defined $1 && defined $2;
            push(@match, $3)       if $3;
            push(@row, \@match) if @match;
        }
    }

    print "$_: ", Dumper(\@row);
}

exit 0;

__DATA__
move 10 15
move 10 15 map
move map
move 10 15 mapA mapB
move 10 15 mapA mapB 33 44
move 10 15 mapA 33 44 mapB
move 0 15 mapA 33 44 mapB
0
On

Maybe it is better to divide and conquer than to force feed everything into one regex?

#!/usr/bin/perl
use warnings;
use strict;

use Data::Dumper;

while (<DATA>) {
    my @row;
    chomp;
    if (/^move/) {
        while (/(?:(\d+)\s+(\d+))?(?:\s+([[:alpha:]]+))?/g) {
            my @match;
            push(@match, +$1, +$2) if $1 && $2;
            push(@match, $3)       if $3;
            push(@row, \@match) if @match;
        }
    }

    print "$_: ", Dumper(\@row);
}

exit 0;

__DATA__
move 10 15
move 10 15 map
move map
move 10 15 mapA mapB
move 10 15 mapA mapB 33 44
move 10 15 mapA 33 44 mapB

Test run:

$ perl dummy.pl
move 10 15: $VAR1 = [
          [
            '10',
            '15'
          ]
        ];
move 10 15 map: $VAR1 = [
          [
            '10',
            '15',
            'map'
          ]
        ];
move map: $VAR1 = [
          [
            'map'
          ]
        ];
move 10 15 mapA mapB: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            'mapB'
          ]
        ];
move 10 15 mapA mapB 33 44: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            'mapB'
          ],
          [
            '33',
            '44'
          ]
        ];
move 10 15 mapA 33 44 mapB: $VAR1 = [
          [
            '10',
            '15',
            'mapA'
          ],
          [
            '33',
            '44',
            'mapB'
          ]
        ];