Why is my Rcpp mean function slower than R?

338 Views Asked by At

I want to create a C++ function that raises each element in x to power and takes the mean. I've created three versions:

  • power_mean_R: R solution -- mean(x^power)
  • power_mean_C: C++ solution
  • power_mean_C_2arg: C++ solution with extra power argument

The extra power argument seems to drastically slow down the function to the point that it's slower than the R implementation. Is this a reality of using Rcpp or is there something I can improve in my code?

#library(Rcpp)

cppFunction(
    'double power_mean_C_2arg(NumericVector x, double power) {

        int n = x.size();
        double total = 0;

        for(int i=0; i<n; ++i) {
            total += pow(x[i], power);
        }

        return total / n;

    }'
)

cppFunction(
    'double power_mean_C(NumericVector x) {

        int n = x.size();
        double total = 0;

        for(int i=0; i<n; ++i) {
            total += pow(x[i], 2);
        }

        return total / n;

    }'
)

power_mean_R <- function(x, power) {
    mean(x^power)
}

bench::mark(
    R = power_mean_R(1:100, p = 2),
    C = power_mean_C(1:100),
    C2arg = power_mean_C_2arg(x = 1:100, p = 2)
)

  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>
1 R          5.91µs 6.91µs   112386.    1.27KB      0   10000     0       89ms <dbl … <Rpro…
2 C          2.87µs 3.54µs   231281.    3.32KB     23.1  9999     1     43.2ms <dbl … <Rpro…
3 C2arg      6.02µs 6.89µs   116187.    3.32KB      0   10000     0     86.1ms <dbl … <Rpro…

1

There are 1 best solutions below

1
On BEST ANSWER

There are few things that handicap your C++ function with the examples you gave

1:100 is an ALTREP sequence, for which a highly optimized sum method exists which is much faster. In the below extreme examples, it's over 6 million times faster. Obviously the vector is not altrep all the way through, but it's a bad idea to benchmark on altrep sequences.

billion <- c(1L, 2:1e9)
billalt <- 1:1e9

identical(billion, billalt)
#> [1] TRUE

bench::mark(sum(billion), sum(billalt))
#> # A tibble: 2 x 10
#>   expression             min            mean          median          max
#>   <chr>             <bch:tm>        <bch:tm>        <bch:tm>     <bch:tm>
#> 1 sum(billi~ 614564900.000ns 614564900.000ns 614564900.000ns 614564.900us
#> 2 sum(billa~       100.000ns       312.530ns       200.000ns     23.300us
#> # ... with 5 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, n_gc <dbl>,
#> #   n_itr <int>, total_time <bch:tm>

Created on 2020-12-11 by the reprex package (v0.3.0)

Second, 1:100 is an integer vector but your Rcpp function accepts a numeric vector, so the data must be coerced to type double before any operations are done. For a vector so small, it's likely to be a considerable part of the overhead.

Your test vector is very small so overheads like Rcpp's preservation of random seeds are going to dominate the differences.