Perl Effeciency - Testing ARGV inside while(<>) loop

627 Views Asked by At

Is my understanding correct when I state the following:

It is wasting CPU cycles to test $ARGV (i.e. the file-name) for some condition inside a while(<>) loop. It is more efficient to test the file-name first, and then process each line accordingly inside a while() loop. This way it is not redundantly checking the file-name every time it grabs a line of data.

Or does the diamond operator do some magic to make this just as efficient as the latter?

2

There are 2 best solutions below

0
On

Is it wasting CPU cycles?

You are running Perl. Which runs on a VM. Do you have any idea how many pointer dereferences a simple lookup of a global variable implies? Why would you care about cycles? /s

Although the <> operator does imply a fair amount of magic, this does not optimize your loops. Therefore, in

my $lastfile = "";
while (<>) {
  say "file changed to ", $lastfile = $ARGV if $lastfile ne $ARGV;
  print "> $_";
}

the ne check will be executed for every single line. If you are optimizing for code size, or development time, this is totally acceptable. If you are optimizing for the total number of Opcodes executed, it may sometimes be cheaper to do the opening of the files explicitly:

use autodie;
for my $file (@ARGV) {
  open my $fh, "<", $file;
  say "file changed to $file";
  while (<$fh>) {
    print "> $_";
  }
}

I personally find this form to be more elegant (single assignment form), and it isn't that much longer anyway. But in a quick oneliner, I'd use the first solution because my script could had already run a thousand times while I'd still be writing the longer form.

0
On

As a proof of concept (see below), I ran the two scripts that amon illustrated (I named them poc.pl and poc2.pl respectively) on a file of one-billion lines. The former was 21.7% slower. In conclusion, this is only significant when dealing with an enormous sum of lines; in which case a lower-level language may be a better choice.

bash $  wc -l large.log
1000000000 large.log
bash $  time perl poc.pl large.log >/dev/null
  291.16s real   289.89s user     0.73s system
bash $  time perl poc2.pl large.log >/dev/null
  239.29s real   238.58s user     0.53s system