I am fitting a statistical model whose likelihood and gradient evaluation depend on many complicated matrix operations, for which the Eigen library is very well-suited. I am working within Rcpp and having great success with RcppEigen. My likelihood/gradient contain a loop over the data size that is, in theory, trivially parallelizable. I would like to parallelize my likelihood computation using RcppParallel.
RcppParallel requires input of a Rcpp::NumericVector or Rcpp::NumericMatrix. I am able to transform between Eigen::MatrixXd and Rcpp::NumericMatrix thanks to a very helpful post here. In my example below, I reproduce one of the RcppParallel examples using this conversion. However, I find the performance hit of doing this appears too severe to make parallel computations worthwhile.
My questions are as follows:
- Can an
RcppParallelworker take/operate on anEigen::MatrixXddirectly? - Can anybody suggest an alternative method of parallelizing a loop involving a significant amount of
Eigen-specific matrix algebra?
Some further information:
- Re-writing my entire program to not use
Eigenis very undesirable. - My local machine is an M1 Mac; I have access to servers running Ubuntu so can move to that environment if that might help.
Here is an example from the RcppParallel documentation, and a modification with Eigen classes that does compile and run and give the right answer, but incurs a substantial performance hit. My desired outcome is to be able to input and output the Eigen classed objects, without incurring this performance hit-- either with RcppParallel or some other method of parallelization.
Note that this case doesn't even cover what I actually want, which is to be able to use Eigen::MatrixXd etc within the RcppParallel Worker. The RcppParallel documentation appears to explicitly advise against doing this type of conversion within the Worker: "The code that you write within parallel workers should not call the R or Rcpp API in any fashion. ". Nonetheless, even the below incurs a performance hit, which has me wondering about the overall feasibility of what I would like to do here.
#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
#include <math.h>
// Square root all the elements in a big matrix
struct SquareRoot : public Worker {
const RMatrix<double> input;
RMatrix<double> output;
// Constructor
SquareRoot(const NumericMatrix input, NumericMatrix output) : input(input), output(output) {}
// Call operator
void operator()(std::size_t begin,std::size_t end) {
std::transform(input.begin()+begin,
input.begin()+end,
output.begin()+begin,
(double(*)(double)) sqrt);
}
};
// [[Rcpp::export]]
NumericMatrix parallelMatrixSqrt(NumericMatrix x) {
NumericMatrix output(x.nrow(),x.ncol());
SquareRoot mysqrt(x,output);
parallelFor(0,x.length(),mysqrt,100);
return output;
}
// Try it with an Eigen input and output, compare performance
// [[Rcpp::export]]
Eigen::MatrixXd parallelMatrixSqrtEigen(Eigen::MatrixXd x) {
// Convert to NumericMatrix
SEXP s = wrap(x);
NumericMatrix output(x.rows(),x.cols());
NumericMatrix xR(s);
// Do the parallel computations
SquareRoot mysqrt(xR,output);
parallelFor(0,xR.length(),mysqrt,100);
// Convert back
Eigen::MatrixXd s2 = as<Eigen::MatrixXd>(output);
return s2;
}
Performance:
codepath <- "~/work/projects/misc/rcppparallel"
codefile <- "try-rcppparallel.cpp"
library(Rcpp)
library(RcppParallel)
sourceCpp(file.path(codepath,codefile))
n <- 1000
mm <- matrix(rnorm(n*n)^2,n,n)
microbenchmark::microbenchmark(
sqrt(mm),
parallelMatrixSqrt(mm),
parallelMatrixSqrtEigen(mm)
)
Results:
Unit: microseconds
expr min lq mean median uq max neval cld
sqrt(mm) 1179.242 1294.6775 1692.3123 1384.7955 1464.213 3581.924 100 b
parallelMatrixSqrt(mm) 321.973 496.3665 909.3078 571.0685 784.617 3582.785 100 a
parallelMatrixSqrtEigen(mm) 3223.133 3504.1675 4422.4277 4071.7715 5313.190 6737.735 100 c
A major thank you in advance!