WWW::Scripter package using 1.2GB of memory while being used for webscraping

60 Views Asked by At

I am not super familiar with this package in the first place. I discovered that the use_plugin('JavaScript') method consumes alot of memory through a profiler. I swapped this method for the plugin('JavaScript'), though the memory consumption was lower, i could not event go through the login page form of the websites i am supposed to scrap.

Globally defined:

my $scripter = WWW::Scripter->new();
$scripter->use_plugin('JavaScript')
if(my $form = $scripter->form_with_fields("Password")){
  $form->value('Password', $conf->{'moxa_p'});
  $form->submit();
}else{
  print "dbg +> form 1.0 not found";
}

Tried using the delete and undef keyword but it does not help at all!

1

There are 1 best solutions below

0
AnFi On BEST ANSWER

Reduce stack of cached pages (WWW::Scripter WWW::Mechanize)

Use max_docs in WWW::Scripter or stack_depth in WWW::Mechanize. WWW::Machanize man page recommends setting in to 5 or 10.

man WWW::Scripter

max_docs
The maximum number of document objects to keep in history (along with their corresponding request and response objects). If this is omitted, Mech's stack_depth + 1 will be used. This is off by one because stack_depth is the number of pages you can go back to, so it is one less than the number of recorded pages. max_docs considers 0 to be equivalent to infinity.

man WWW::Mechanize

"stack_depth => $value"
Sets the depth of the page stack that keeps track of all the downloaded pages. Default is effectively infinite stack size. If the stack is eating up your memory, then set this to a smaller number, say 5 or 10. Setting this to zero means Mech will keep no history.