Why isn't this regex executing?

95 Views Asked by At

I'm attempting to convert my personal wiki from Foswiki to Markdown files and then to a JAMstack deployment. Foswiki uses flat files and stores metadata in the following format:

%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%

I want to use a git repo for versioning and will worry about linking that to article metatada later. At this point I simply want to convert these blocks to something that looks like this:

---
author: Teoti Nathaniel
revdate: 1539108277
---

After a bit of tweaking I have constructed the following regex:

author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]

According to regex101 this works and my two capture groups contain the desired results. Attempting to actually run it:

perl -0777 -pe 's/author\=\['"\]\(\\w\+\)\['"\]\(\?\:\.\*\)date\=\['"\]\(\\w\+\)\['"\]/author: $1\nrevdate: $2/gms' somefile.txt

gets me only this:

>

My previous attempt (which breaks if the details aren't in a specific order) looked like this and executed correctly:

perl -0777 -pe 's/%META:TOPICINFO\{author="(.*)"\ date="(.*)"\ format="(.*)"\ (.*)\}\%/author:$1 \nrevdate:$2/gms' somefile.txt

I think that this is an escape character problem but can't figure it out. I even went and found this tool to make sure that they are correct.

Brute-forcing my way to understanding here is feeling both inefficient and frustrating, so I'm asking the community for help.

3

There are 3 best solutions below

1
On

Ok, I kept fooling around with it by reducing the execution to a single term and expanding. I soon got to here:

$ perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=\['\"\]\(\\w\+\)\['\"\]/author\: \$1\\nrevdate\: \$2/gms' somefile.txt

Unmatched [ in regex; marked by <-- HERE in m/author=["](\w+)["](?:.*)date=\["](\w+)[ <-- HERE \"\]/ at -e line 1.

This eventually got me to here:

perl -0777 -pe 's/author=['\"]\(\\w\+\)['"](?:.*)date=['\"]\(\\w\+\)['\"]/\nauthor\ $1\nrevdate\:$2\n/gms' somefile.txt

Which produces a messy output but works. (Note: Output is proof-of-concept and this can now be used within a Python script to programattically generate Markdown metadata.

Thanks for being my rubber duckie, StackOverflow. Hopefully this is useful to someone, somewhere, somewhen.

0
On

The first major problem is that you're trying to use a single quote (') in the program, when the program is being passed to the shell in single quotes.

Escape any instance of ' in the program by using '\''. You could also use \x27 if the quote happens to be a single double-quoted string literal or regex literal (as is the case of every instance in your program).

perl -0777pe's/author=['\''"].../.../gs'
perl -0777pe's/author=[\x27"].../.../gs'
0
On

I would try to break it down into a clean data structure then process it. By seperating the data processing to printing, you can modifiy to add extra data later. It also makes it far more readable. Please see the example below

#!/usr/bin/env perl
use strict;
use warnings;
## yaml to print the data, not required for operation
use YAML::XS qw(Dump);
my $yaml;

my @lines = '%META:TOPICINFO{author="TeotiNathaniel" comment="reprev" date="1571215308" format="1.1" reprev="13" version="14"}%';

for my $str (@lines )
{
    ### split line into component parts
    my ( $type , $subject , $data ) = $str =~ /\%(.*?):(.*?)\{(.*)\}\%/;
    ## break data in {} into a hash
    my %info = map( split(/=/),  split(/\s+/, $data) );

    ## strip quotes if any exist
    s/^"(.*)"$/$1/ for values %info;

    #add to data structure
    $yaml->{$type}{$subject} = \%info;
}
## yaml to print the data, not required for operation
print Dump($yaml);

## loop data and print
for my $t (keys %{ $yaml } ) {
    for my $s (keys %{ $yaml->{$t} } ) {
        print "-----------\n";
        print "author: ".$yaml->{$t}{$s}{"author"}."\n";
        print "date: ".$yaml->{$t}{$s}{"date"}."\n";
    }
}