Extracting HTML from url using Perl

58 Views Asked by At

I want to extract the HTML code of a TWiki (who's URL i have). What is the best possible way of doing that?

Additionally, once i extract the HTML code i need to out it in a site hosted on Google Sites. Is that possible?

2

There are 2 best solutions below

1
On

Sounds like you need the CPAN HTML::Parser module.

use HTML::Parser ();

 # Create parser object
 $p = HTML::Parser->new( api_version => 3,
                         start_h => [\&start, "tagname, attr"],
                         end_h   => [\&end,   "tagname"],
                         marked_sections => 1,
                       );
# Parse directly from file
 $p->parse_file("foo.html");
0
On

A very simple way to get a HTML page is the LWP::Simple module. If you have to do a more complex navigation flow, then use WWW::Mechanize. Then, if you need to parse the HTML code, the @brian solution is good.