Why is my RcppEigen code almost 40 times slower than R's base LAPACK implementation?

137 Views Asked by At

I'm trying to implement efficient matrix LLT (or LU, whatever) decomposition for some covariance matrix shenanigans.

I've been using the Eigen library (through RcppEigen) for a while and, just for fun, I wanted to see how much better it fared against LAPACK, which, according to the famous How does Eigen compare to BLAS/LAPACK? link, it should be considerable.

Trying to make the best of my memory, I played with the triangularView class. However, after bad results, I tried with the full matrix as well, to disappointing results. I paste the code that I'm using here, as it's pretty simple (and minimal). Can someone help me realize if I'm doing something wrong?

library(Rcpp)

code <- '
#include <RcppEigen.h>
using namespace Eigen;
// [[Rcpp::depends("RcppEigen")]]

// [[Rcpp::export]]
MatrixXd invertTriangular(MatrixXd input) {
  LLT<MatrixXd> solver;
  solver.compute(input.triangularView<Lower>());
  return solver.solve(MatrixXd::Identity(input.rows(), input.cols()));
}

// [[Rcpp::export]]
MatrixXd invertTriangular2(MatrixXd input) {
  return input.selfadjointView<Lower>().llt().solve(MatrixXd::Identity(input.rows(), input.cols()));
}

// [[Rcpp::export]]
MatrixXd invertFull(MatrixXd input) {
  LLT<MatrixXd> solver;
  solver.compute(input);
  return solver.solve(MatrixXd::Identity(input.rows(), input.cols()));
}

// [[Rcpp::export]]
MatrixXd invertFull2(MatrixXd input) {
  return input.llt().solve(MatrixXd::Identity(input.rows(), input.cols()));
}
'

sourceCpp(code = code)

set.seed(123)
points <- cbind(runif(400), runif(400))
distances <- as.matrix(dist(points))
covars <- exp(- distances / 0.3) * 2
{
  othercovars <- covars
  othercovars[1, 2:4] <- 0
  othercovars[2, 3:4] <- 0
  othercovars[3, 4] <- 0
}

(compare <- microbenchmark::microbenchmark(
  LAPACK = solve(covars),
  EigenTriangular = invertTriangular(covars),
  EigenOtherTriangular = invertTriangular(othercovars),
  EigenTriangular2 = invertTriangular2(covars),
  EigenOtherTriangular2 = invertTriangular2(othercovars),
  EigenFull = invertFull(covars),
  EigenFull2 = invertFull2(covars)
))
sessionInfo()

Here is the output I get:

Unit: milliseconds
                  expr       min        lq      mean    median        uq       max neval
                LAPACK  14.25909  14.63404  15.93704  15.72874  17.14541  21.25099   100
       EigenTriangular 624.28999 629.22478 632.95525 632.26816 634.70518 667.97436   100
  EigenOtherTriangular 623.51207 629.50938 632.29419 631.77406 634.08616 657.11286   100
      EigenTriangular2 623.11969 628.45789 631.08457 630.54866 632.65461 656.86033   100
 EigenOtherTriangular2 620.54096 628.21065 631.11047 630.80675 633.41481 650.41078   100
             EigenFull 622.25617 627.68885 630.01524 629.76616 632.19636 645.39284   100
            EigenFull2 623.26877 628.02666 629.86041 629.65087 631.96649 637.55556   100
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C               LC_TIME=pt_PT.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_1.0.9

loaded via a namespace (and not attached):
[1] microbenchmark_1.4.9 compiler_4.2.2       RcppEigen_0.3.3.9.3  Matrix_1.5-3         tools_4.2.2         
[6] grid_4.2.2           lattice_0.20-45 

So, Eigen's way takes about 39.375 (630 / 16) longer to make this inversion than LAPACK.

What I tried: Different ways to achieve matrix inversion with Eigen. Result: They are all significantly slower than R base using LAPACK.

1

There are 1 best solutions below

0
On

Discord (Toby) gave the answer. You need optimization flags to get crazy Eigen speed. Added -O3 and Eigen is now faster than LAPACK by a bit.