Why does replacing Perl's s/// with a dummy function using Inline::C cause a significant slow down?

106 Views Asked by At

I have an array of strings of about 100,000 elements. I need to iterate through each element and replace some words with other words. This takes a few seconds in pure perl. I need to speed this up as much as I can. I'm testing using the following snippet:

use strict;

my $string = "This is some string. Its only purpose is for testing.";
for( my $i = 1; $i < 100000; $i++ ) {
  $string =~ s/old1/new1/ig;
  $string =~ s/old2/new2/ig;
  $string =~ s/old3/new3/ig;
  $string =~ s/old4/new4/ig;
  $string =~ s/old5/new5/ig;
}

I know this doesn't actually replace anything in the test string, but it's for speed testing only.

I had my hopes set on Inline::C. I've never worked with Inline::C before but after reading up on it a bit, I thought it was fairly simple to implement. But apparently, even calling a stub function that does nothing is a lot slower. Here's the snippet I tested with:

use strict;
use Benchmark qw ( timethese );

use Inline 'C';

timethese(
   5,
   {
      "Pure Perl"  => \&pure_perl,
      "Inline C"   => \&inline_c
   }
);

sub pure_perl {
  my $string = "This is some string. Its only purpose is for testing.";
  for( my $i = 1; $i < 1000000; $i++ ) {
    $string =~ s/old1/new1/ig;
    $string =~ s/old2/new2/ig;
    $string =~ s/old3/new3/ig;
    $string =~ s/old4/new4/ig;
    $string =~ s/old5/new5/ig;
  }
}

sub inline_c {
  my $string = "This is some string. Its only purpose is for testing.";
  for( my $i = 1; $i < 1000000; $i++ ) {
    $string = findreplace( $string, "old1", "new1" );
    $string = findreplace( $string, "old2", "new2" );
    $string = findreplace( $string, "old3", "new3" );
    $string = findreplace( $string, "old4", "new4" );
    $string = findreplace( $string, "old5", "new5" );
  }
}

__DATA__
__C__

char *
findreplace( char *text, char *what, char *with ) {

  return text;

}

on my Linux box, the result is:

Benchmark: timing 5 iterations of Inline C, Pure Perl...
  Inline C:  6 wallclock secs ( 5.51 usr +  0.02 sys =  5.53 CPU) @  0.90/s (n=5)
  Pure Perl:  2 wallclock secs ( 2.51 usr +  0.00 sys =  2.51 CPU) @  1.99/s (n=5)

Pure Perl is twice as fast as calling an empty C function. Not at all what I expected! Again, I've never worked with Inline::C before so maybe I am missing something here?

1

There are 1 best solutions below

2
On

In the version using Inline::C, you kept everything that was in the original pure Perl script, and changed just one thing: Additionally, you've replaced Perl's highly optimized s/// with a worse implementation. Invoking your dummy function actually involves work whereas none of the s/// invocations do much in this case. It is a priori impossible for the Inline::C version to run faster.

On the C side, the function

char *
findreplace( char *text, char *what, char *with ) {

  return text;

}

is not a "do nothing" function. Calling it involves unpacking arguments. The string pointed to by text has to be copied to the return value. There is some overhead which you are paying for each invocation.

Given that s/// does no replacements, there is no copying involved in that. In addition, Perl's s/// is highly optimized. Are you sure you can write a better find & replace that is faster to make up for the overhead of calling an external function?

If you use the following implementation, you should get comparable speeds:

sub inline_c {
  my $string = "This is some string. It's only purpose is for testing.";
  for( my $i = 1; $i < 1000000; $i++ ) {
    findreplace( $string );
    findreplace( $string );
    findreplace( $string );
    findreplace( $string );
    findreplace( $string );
  }
}

__END__
__C__

void findreplace( char *text ) {
    return;

}
Benchmark: timing 5 iterations of Inline C, Pure Perl...
  Inline C:  6 wallclock secs ( 5.69 usr +  0.00 sys =  5.69 CPU) @  0.88/s (n=5)
 Pure Perl:  6 wallclock secs ( 5.70 usr +  0.00 sys =  5.70 CPU) @  0.88/s (n=5)

The one possibility of gaining speed is to exploit any special structure involved in the search pattern and replacements and write something to implement that.

On the Perl side, you should at least pre-compile the patterns.

Also, since your problem is embarrassingly parallel, you are better off looking into chopping up the work into as many chunks as you have cores to work with.

For example, take a look at the Perl entries in the regex-redux task in the Benchmarks Game:

Perl #4 (fork only): 14.13 seconds

and

Perl #3 (fork & threads): 14.47 seconds

versus

Perl #1: 34.01 seconds

That is, some primitive exploitation of parallelization possibilities results in a 60% speedup. That problem is not exactly comparable because the substitutions must be done sequentially, but still gives you an idea.

If you have eight cores, dole out the work to eight cores.

Also, consider the following script:

#!/usr/bin/env perl

use warnings;
use strict;

use Data::Fake::Text;
use List::Util qw( sum );
use Time::HiRes qw( time );

use constant INPUT_SIZE => $ARGV[0] // 1_000_000;

run();

sub run {
    my @substitutions = (
        sub { s/dolor/new1/ig   },
        sub { s/fuga/new2/ig    },
        sub { s/facilis/new3/ig },
        sub { s/tempo/new4/ig   },
        sub { s/magni/new5/ig   },
    );

    my @times;
    for (1 .. 5) {
        my $data = read_input();
        my $t0 = time;
        find_and_replace($data, \@substitutions);
        push @times, time - $t0;
    }

    printf "%.4f\n", sum(@times)/@times;

    return;
}

sub find_and_replace {
    my $data = shift;
    my $substitutions = shift;

    for ( @$data ) {
        for my $s ( @$substitutions ) {
            $s->();
        }
    }
    return;
}

{
    my @input;
    sub read_input {
        @input
            or @input = map fake_sentences(1)->(), 1 .. INPUT_SIZE;
        return [ @input ];
    }
}

In this case, each invocation of find_and_replace takes about 2.3 seconds my laptop. The five replications run in about 30 seconds. The overhead is the combined cost of generating the 1,000,000 sentence data set and copying it four times.