' - how can I extract the em" /> ' - how can I extract the em" /> ' - how can I extract the em"/>

Extract email from string using Template Tookit

165 Views Asked by At

I'm guessing this is relatively simple, but I can't find the answer.

From a string such as '"John Doe" <[email protected]>' - how can I extract the email portion from it using Template Tookit?

An example string to parse is this:

$VAR1 = { 
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => '[email protected] on behalf of Caroline <[email protected]>',
    'bytes' => 13620,
    'pmail' => '[email protected]',
    'sender' => '[email protected]',
    'subject' => 'Some Email Subject'
};

My code, based on @dave-cross help below where $VAR1 is the output of dumper.dump(item.from)

[% text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
<td>[% matches.1 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

However, it's still not matching against $VAR1

3

There are 3 best solutions below

9
Dave Cross On BEST ANSWER

This does what you want, but it's pretty fragile and this really isn't the kind of thing that you should be doing in TT code. You should either get the data parsed outside of the template and passed into variables, or you should pass in a parsing subroutine that can be called from inside the template.

But, having given you the caveats, if you still insist this is what you want to do, then this is how you might do it:

In test.tt:

[% text = '"John Doe" <[email protected]>';
   matches = text.match('"(.*?)"\s+<(.*?)>');
   IF matches -%]
Name: [% matches.0 %]
Email: [% matches.1 %]
[% ELSE -%]
No match found
[% END -%]

Then, testing using tpage:

$ tpage test.tt
Name: John Doe
Email: [email protected]

But I cannot emphasise enough that you should not be doing it like this.

Update: I've used this test template to investigate your further problem.

[% item = { from => '"John Doe" <[email protected]>' };
   text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?+<(.*?)>')) -%]
<td>[% matches.1 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

And running it, I get this:

$ tpage test2.tt
<td> </td>

That's what I'd expect to see for a match. You're printing matches.1. That's the second item from the matches array. And the second match group is (\s). So I'm getting the space between the name and the opening angle bracket.

You probably don't want that whitespace match in your matches array, so I'd remove the parentheses around it, to make the regex (.*?)\s*<(.*?)> (note that \s* is a simpler way to say "zero or more whitespace characters").

You can now use matches.0 to get the name and matches.1 to get the email address.

Oh, and there's no need to copy items.from into text. You can call the matches vmethod on any scalar variable, so it's probably simpler to just use:

[% matches = item.from.match(...) -%]

Did I mention that this is all a really terrible idea? :-)

Update2:

This is all going to be far easier if you give me complete, runnable code examples in the same way that I am doing for you. Any time I have to edit something in order to get an example running, we run the risk that I'm guessing incorrectly how your code works.

But, bearing that in mind, here's my latest test template:

[% item = {
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => '[email protected] on behalf of Caroline <[email protected]>',
    'bytes' => 13620,
    'pmail' => '[email protected]',
    'sender' => '[email protected]',
    'subject' => 'Some Email Subject'
};
   text = item.from -%]
[% IF (matches = text.match('(.*?)(\s)?<(.*?)>')) -%]
<td>[% matches.2 %]</td>
[% ELSE -%]
<td>[% text %]</td>
[% END %]

I've changed the definition of item to have your full example. I've left the regex as it was before my suggestions. And (because I haven't changed the regex) I've changed the output to print matches.2 instead of matches.1.

And here's what happens:

$ tpage test3.tt
<td>[email protected]</td>

So it works.

If yours doesn't work, then you need to identify the differences between my (working) code and your (non-working) code. I'm happy to help you identify those differences, but you have to give my your non-working example in order for me to do that.

Update3:

Again I've tried to incorporate the changes that you're talking about. But again, I've had to guess at stuff because you're not sharing complete runnable examples. And again, my code works as expected.

[% USE dumper -%]
[% item = {
    'date' => '2021-03-25',
    'time' => '03:58:18',
    'href' => 'https://example.com',
    'from' => '[email protected] on behalf of Caroline <[email protected]>',
    'bytes' => 13620,
    'pmail' => '[email protected]',
    'sender' => '[email protected]',
    'subject' => 'Some Email Subject'
};
 -%]
[% matches = item.from.match('(.*?)(\s)?<(.*?)>') -%]
[% dumper.dump(matches) %]

And testing it:

$ tpage test4.tt
$VAR1 = [
          '[email protected] on behalf of Caroline',
          ' ',
          '[email protected]'
        ];

So that works. If you want any more help, then send a complete runnable example. If you don't do that, I won't be able to help you any more.

0
choroba On

I have no idea how Template Toolkit can help you. Use Email::Address or Email::Address::XS to parse an e-mail address.

5
brian d foy On

There's a very old (and unmaintained) module, Template::Extract, that let's you define a template, then work backward from a string that might have been produced by that template:

use Template::Extract;
use Data::Dumper;

my $obj = Template::Extract->new;
my $template = qq("[% name %]" <[% email %]>);

my $string = '"John Doe" <[email protected]>';

my $extracted = $obj->extract($template, $string);

print Dumper( $extracted );

The output is:

$VAR1 = {
          'email' => '[email protected]',
          'name' => 'John Doe'
        };

However, there are modules that already do this job for you and will handle many more situations