I have a huge webpage, which is about 5G size. And I hope I could read the content of the webpage directly(remotely) without downloading the whole file. I have used the Open File Handler to open the HTTP content. But the error message given is No such files or directory. I tried to use LWP::Simple, but it was out of memory if I use get the whole content. I wonder if there is a way that I could open this content remotely, and read line by line.
Thank you for your help.
About Perl reading the webpage online via HTTP
404 Views Asked by Chris Andrews At
2
There are 2 best solutions below
0
On
This Perl code will download file from URL with possible continuation if file was already partially downloaded.
This code requires that server returns file size (aka content-length) on HEAD request, and also requires that server supports byte ranges on URL in question.
If you want some special processing for next chunk, just override it below:
use strict;
use LWP::UserAgent;
use List::Util qw(min max);
my $url = "http://example.com/huge-file.bin";
my $file = "huge-file.bin";
DownloadUrl($url, $file);
sub DownloadUrl {
my ($url, $file, $chunksize) = @_;
$chunksize ||= 1024*1024;
my $ua = new LWP::UserAgent;
my $res = $ua->head($url);
my $size = $res->headers()->{"content-length"};
die "Cannot get size for $url" unless defined $size;
open FILE, ">>$file" or die "ERROR: $!";
for (;;) {
flush FILE;
my $range1 = -s FILE;
my $range2 = min($range1 + $chunksize, $size);
last if $range1 eq $range2;
$res = $ua->get($url, Range => "bytes=$range1-$range2");
last unless $res->is_success();
# process next chunk:
print FILE $res->content();
}
close FILE;
}
You could try using LWP::UserAgent. The
requestmethod allows you to specify a CODE reference, which would let you process the data as it's coming in.Technically the function should return the content instead of undef, but it seems to work if you return undef. According to the documentation:
I haven't tried this on a large file, and you would need to write your own code to handle the data coming in as arbitrarily sized chunks.