HTML::Tree: Can't call method "as_text" on an undefined value

977 Views Asked by At

I am parsing a real estate web page, using HTML::TreeBuilder, and have the following code:

$values{"Pcity"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressLocality")->as_text;
$values{"PState"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressRegion")->as_text;

Some pages don't contain city or state, and the parser exits with an error:

Can't call method "as_text" on an undefined value

To fix it I used the following method:

$values{"Pcity"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressLocality");
if(defined($values{"Pcity"}))
{
    $values{"Pcity"} = $values{"Pcity"}->as_text;
}
else
{
    $values{"Pcity"} = '';
}

It works, but now instead of 1 line I have 9. And as I have many places like this the code will become considerably bigger.

Is there any way to optimize?

2

There are 2 best solutions below

5
On BEST ANSWER

This is shorter:

$a = $address->look_down("_tag" => "span", "itemprop" => "addressLocality");
$values{"Pcity"} = $a ? $a->as_text : '';
0
On

Assuming that $address never contains more than one <span> with either of the given values for the itemprop attribute, you could write this

for my $span ( $address->look_down(_tag => 'span') ) {
   my $itemprop    = $span->attr('itemprop');
   $values{Pcity}  = $span->as_text if $itemprop eq 'addressLocality';
   $values{PState} = $span->as_text if $itemprop eq 'addressRegion';
}

But accessing HTML trees is made much more simple by the use of HTML::TreeBuilder::XPath, which allows the structure to be indexed using XPath expressions instead of the clumsy look_down. A solution using it would look like this, with the proviso that findvalue returns an empty string '' for non-existent nodes, rather than undef; but that should be workable for you as it still evaluates to false.

use strict;
use warnings;

use HTML::TreeBuilder::XPath;

my $xp = HTML::TreeBuilder::XPath->new_from_file(*DATA);

my %values;

$values{Pcity}  = $xp->findvalue('//span[@itemprop="addressLocality"]');
$values{PState} = $xp->findvalue('//span[@itemprop="addressRegion"]');

use Data::Dump;
dd \%values;

__DATA__
<html>
<head>
  <title>Title</title>
</head>
<body>
  <span itemprop="addressLocality">My Locality</span>
  <span itemprop="addressRegion">My Region</span>
</body>
</html>

output

{ Pcity => "My Locality", PState => "My Region" }