I've got a page that I want all the links off of (e.g. http://www.stephenfry.com/). I want to put all the links that are of the form http://www.stephenfry.com/WHATEVER into an array. What I've got now is just the following method:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::Tree;
# I ONLY WANT TO USE JUST THESE
my $url = 'http://www.stephenfry.com/';
my $doc = get( $url );
my $adt = HTML::Tree->new();
$adt->parse( $doc );
my @juice = $adt->look_down(
_tag => 'a',
href => 'REGEX?'
);
Not sure how to put only these links in.
You'll want to use the
extract_links()
method, notlook_down()
:Partial output:
Using WWW::Mechanize may be simpler, and it does return more links:
Partial output:
Hope this helps!