Can somebody explain to me somewhat odd behavior of simple for loop written using Rcpp (code is below). Based on microbenchmark output it seems that the algorithmic complexity of the for_iteration
is constant which is not true based on its code. For comparison I tested function for_double_iteration
and its behavior is consistent with its code complexity. This code was run on Ubuntu 16.04 and CPU Intel Core i3-6100 but the same result was obtained on Windows 7 and CPU Intel Core i5-2300.
Here is the code:
library(Rcpp)
library(microbenchmark)
sourceCpp(code='
#include <Rcpp.h>
// [[Rcpp::export]]
int for_iteration(const int n) {
int j = 0;
for (int i = 0; i < n; i++) {
j++;
}
return (j);
}
// [[Rcpp::export]]
int for_double_iteration(const int n) {
int j = 0, k = 0;
for (int i = 0; i < n; i++) {
for (k = 0; k < i; k++) {
j++;
}
}
return (j);
}'
)
Here are the benchmarks:
> microbenchmark(for_iteration(10^5),
for_iteration(10^6),
for_iteration(10^7),
for_iteration(10^8),
for_iteration(10^9),
times=1000, unit="us")
Unit: microseconds
expr min lq mean median uq max neval
for_iteration(10^5) 1.254 1.379 1.552229 1.4305 1.5255 9.197 1000
for_iteration(10^6) 1.268 1.379 1.724993 1.4300 1.5410 123.822 1000
for_iteration(10^7) 1.274 1.377 3.687182 1.4240 1.5075 2126.909 1000
for_iteration(10^8) 1.253 1.387 1.546345 1.4360 1.5320 9.527 1000
for_iteration(10^9) 1.265 1.386 1.568382 1.4300 1.5230 20.307 1000
> microbenchmark(for_double_iteration(10^2),
for_double_iteration(10^3),
for_double_iteration(10^4),
for_double_iteration(10^5),
for_double_iteration(10^6),
times=1000, unit="us")
Unit: microseconds
expr min lq mean median uq max neval
for_double_iteration(10^2) 0.921 1.0230 1.516304 1.1100 1.4335 23.308 1000
for_double_iteration(10^3) 1.722 1.7970 2.491999 1.8915 2.2270 49.772 1000
for_double_iteration(10^4) 9.022 9.1165 9.947209 9.2050 9.6925 55.841 1000
for_double_iteration(10^5) 82.170 82.2700 86.240153 82.3590 82.9070 1959.903 1000
for_double_iteration(10^6) 813.723 814.4450 834.870686 826.4625 828.2280 1178.062 1000
My guess is correct, it is optimized out and replaced by
j = n
.Have a check on godbolt, with compiler flag
-O2
, you get assembly output (gas):Oops... there is no loop at all!!