The input html is attached (my $file), With the following script, I cannot extract the table I want. Any suggestions?
use strict;
use warnings;
use HTML::TableExtract;
my $file="view-source_www.nasdaq.com_dividend-stocks_dividend-calendar.aspx_date=2017-Apr-19.html";
open DATA,$file || die "cannot";
my $content;
{
local $/ = undef; # slurp mode
$content = <DATA>;
}
close DATA;
my $te;
$te = HTML::TableExtract->new( headers => [qw(Announcement_Date)] );
$te-> parse($content);
# Examine all matching tables
foreach my $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach my $row ($ts->rows) {
print join(',', @$row), "\n";
}
}
Two problems here.
Firstly, as jcaron points out in a comment, you're not parsing the right thing. You seem to be parsing a "view source" page. You need to get the HTML directly. You can do that with LWP::Simple.
Running your code now gives no errors but, unfortunately, it gives no output either. That's because you're defining the
headersargument to the object constructor incorrectly. You useqw(Announcement_Date)but there is no table header with the value "Announcement_Date", so no matching table is found.If you change the constructor call to this:
Then you get the expected output.