Faster Recode and Matrix Multiplication using Rcpp/RcppEigen

165 Views Asked by At

I have two large matrices where some values must be recoded and after that, these two matrices must be multiplied, as shown in the following example. Using Rcpp is faster only if multiples Threads are used.

Load R Functions and Create Fake Data

library("RcppEigen")
library("benchr")
library("Rcpp")

x_mat <- matrix(sample(as.numeric(0:2), 500*2000, replace = TRUE), ncol = 2000)
y_mat <- matrix(sample(as.numeric(0:2), 5000*2000, replace = TRUE), ncol = 2000)

R Function

recode1 <- function(x, y) {
  x[x != 0] <- -1
  x <- x + 1
  y[y != 2] <-  1
  y <- y - 1
  oh <- tcrossprod(x, y)
  return(oh)
}

Rcpp Function

sourceCpp(code = "// [[Rcpp::depends(RcppEigen)]]
  // [[Rcpp::plugins(openmp)]]
  
  #include <omp.h>
  #include <Rcpp.h>
  #include <RcppEigen.h>
  
  // [[Rcpp::export]]
  SEXP recode2(Eigen::Map<Eigen::MatrixXd> x,
               Eigen::Map<Eigen::MatrixXd> y, 
               int n_cores){
                  
  int nrow_x = x.rows();
  int ncol_x = x.cols();
  int nrow_y = y.rows();
  int ncol_y = y.cols();
  
  Eigen::setNbThreads(n_cores);
  for (int i = 0; i < nrow_x; i++) {
    for (int j = 0; j < ncol_x; j++) {
      if (x(i, j) == 0) { 
        x(i, j) = 1;
      } else {
        x(i, j) = 0;
      }
    }
  }
  for (int i = 0; i < nrow_y; i++) {
    for (int j = 0; j < ncol_y; j++) {
      if (y(i, j) == 2) { 
        y(i, j) = 1;
      } else {
        y(i, j) = 0;
      }
    }
  }
  Eigen::MatrixXd C = x * y.transpose();
  return Rcpp::wrap(C);
}")

Benchmark using 10 threads

Time <- benchmark (
  BaseFUN = recode1(x = x_mat, y = y_mat), 
  RcppFUN = recode2(x = x_mat, y = y_mat, n_cores = 10),
  times = 100
)
print(Time, order = "median")

Benchmark summary:
  Time units : milliseconds 
   expr n.eval min lw.qu median mean up.qu max total relative
RcppFUN    100 153   165    172  179   179 488 17900      1.0
BaseFUN    100 272   352    378  415   446 680 41500      2.2

Benchmark using a single thread

Time <- benchmark (
  BaseFUN = recode1(x = x_mat, y = y_mat), 
  RcppFUN = recode2(x = x_mat, y = y_mat, n_cores = 1),
  times = 100
)
print(Time, order = "median")

Benchmark summary:
  Time units : milliseconds 
   expr n.eval min lw.qu median mean up.qu max total relative
BaseFUN    100 258   338    360  403   473 629 40300     1.00
RcppFUN    100 765   775    780  784   788 859 78400     2.17

Update

Since the number of columns must be the same, I've re-written the function to have the main for loop on columns first.

sourceCpp(code = "// [[Rcpp::depends(RcppEigen)]]
  // [[Rcpp::plugins(openmp)]]
  
  #include <omp.h>
  #include <Rcpp.h>
  #include <RcppEigen.h>
  
  // [[Rcpp::export]]
  SEXP recode3(Eigen::Map<Eigen::MatrixXd> x,
               Eigen::Map<Eigen::MatrixXd> y, 
               int n_cores){
                  
  int nrow_x = x.rows();
  int ncol_x = x.cols();
  int nrow_y = y.rows();
  int ncol_y = y.cols();
  
  Eigen::setNbThreads(n_cores);
  for (int j = 0; j < ncol_x; j++) {
    for (int ix = 0; ix < nrow_x; ix++) {
      if (x(ix, j) == 0) { 
        x(ix, j) = 1;
      } else {
        x(ix, j) = 0;
      }
    }
    for (int iy = 0; iy < nrow_y; iy++) {
      if (y(iy, j) == 2) {
        y(iy, j) = 1;
      } else {
        y(iy, j) = 0;
      }
    }
  }
  Eigen::MatrixXd C = x * y.transpose();
  return Rcpp::wrap(C);
}")

Benchmark update

Benchmark summary:
Time units : milliseconds 
     expr n.eval min lw.qu median mean up.qu max total relative
 Rcpp2FUN     50 104   112    118  120   124 157  5980     1.00
 Rcpp1FUN     50 156   168    174  178   179 235  8910     1.48
 BaseRFUN     50 283   347    377  427   539 785 21300     3.21

Would anyone suggest a better/optimized way to make it faster? Is there a better way to avoid/optimize all those for loop?

Thank you.

0

There are 0 best solutions below