I have two large matrices where some values must be recoded and after that, these two matrices must be multiplied, as shown in the following example. Using Rcpp is faster only if multiples Threads are used.
Load R Functions and Create Fake Data
library("RcppEigen")
library("benchr")
library("Rcpp")
x_mat <- matrix(sample(as.numeric(0:2), 500*2000, replace = TRUE), ncol = 2000)
y_mat <- matrix(sample(as.numeric(0:2), 5000*2000, replace = TRUE), ncol = 2000)
R Function
recode1 <- function(x, y) {
x[x != 0] <- -1
x <- x + 1
y[y != 2] <- 1
y <- y - 1
oh <- tcrossprod(x, y)
return(oh)
}
Rcpp Function
sourceCpp(code = "// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP recode2(Eigen::Map<Eigen::MatrixXd> x,
Eigen::Map<Eigen::MatrixXd> y,
int n_cores){
int nrow_x = x.rows();
int ncol_x = x.cols();
int nrow_y = y.rows();
int ncol_y = y.cols();
Eigen::setNbThreads(n_cores);
for (int i = 0; i < nrow_x; i++) {
for (int j = 0; j < ncol_x; j++) {
if (x(i, j) == 0) {
x(i, j) = 1;
} else {
x(i, j) = 0;
}
}
}
for (int i = 0; i < nrow_y; i++) {
for (int j = 0; j < ncol_y; j++) {
if (y(i, j) == 2) {
y(i, j) = 1;
} else {
y(i, j) = 0;
}
}
}
Eigen::MatrixXd C = x * y.transpose();
return Rcpp::wrap(C);
}")
Benchmark using 10 threads
Time <- benchmark (
BaseFUN = recode1(x = x_mat, y = y_mat),
RcppFUN = recode2(x = x_mat, y = y_mat, n_cores = 10),
times = 100
)
print(Time, order = "median")
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
RcppFUN 100 153 165 172 179 179 488 17900 1.0
BaseFUN 100 272 352 378 415 446 680 41500 2.2
Benchmark using a single thread
Time <- benchmark (
BaseFUN = recode1(x = x_mat, y = y_mat),
RcppFUN = recode2(x = x_mat, y = y_mat, n_cores = 1),
times = 100
)
print(Time, order = "median")
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
BaseFUN 100 258 338 360 403 473 629 40300 1.00
RcppFUN 100 765 775 780 784 788 859 78400 2.17
Update
Since the number of columns must be the same, I've re-written the function to have the main for loop on columns first.
sourceCpp(code = "// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::export]]
SEXP recode3(Eigen::Map<Eigen::MatrixXd> x,
Eigen::Map<Eigen::MatrixXd> y,
int n_cores){
int nrow_x = x.rows();
int ncol_x = x.cols();
int nrow_y = y.rows();
int ncol_y = y.cols();
Eigen::setNbThreads(n_cores);
for (int j = 0; j < ncol_x; j++) {
for (int ix = 0; ix < nrow_x; ix++) {
if (x(ix, j) == 0) {
x(ix, j) = 1;
} else {
x(ix, j) = 0;
}
}
for (int iy = 0; iy < nrow_y; iy++) {
if (y(iy, j) == 2) {
y(iy, j) = 1;
} else {
y(iy, j) = 0;
}
}
}
Eigen::MatrixXd C = x * y.transpose();
return Rcpp::wrap(C);
}")
Benchmark update
Benchmark summary:
Time units : milliseconds
expr n.eval min lw.qu median mean up.qu max total relative
Rcpp2FUN 50 104 112 118 120 124 157 5980 1.00
Rcpp1FUN 50 156 168 174 178 179 235 8910 1.48
BaseRFUN 50 283 347 377 427 539 785 21300 3.21
Would anyone suggest a better/optimized way to make it faster? Is there a better way to avoid/optimize all those for loop
?
Thank you.