I wrote a piece of Rcpp code to speed up some treatment involving a R list of numeric matrices. As the number of matrix within the R list may reach hundreds of thousands, I started parallelizing this Rcpp code using openMP.
But as explained multiple times in many posts here, Rcpp objects (e.g. Rcpp::List and Rcpp::NumericMatrix) cannot be safely involved in multi-threaded sections, as they are subject to R's memory management and its garbage collector.
So I made and tested two versions of the multi-threaded Rcpp function:
i) - One uses a const arma::field<arma::mat> as function 1st parameter and using it as is in the parallelized omp section.
Rcpp::List myRcppFunction1(const arma::field<arma::mat> & InField,
unsigned int threads_nb);
However, this implies deep copy of the data when the R-side list is implicitly converted to arma::fieldarma::mat.
ii) - The other uses a Rcpp::List as 1st parameter, and defines
a std::vector<double*> whose elements point to the matrix
addresses inside the Rcpp::List. Only this std::vector
is involved in the parallelized section, as following:
Rcpp::List myRcppFunction2(const Rcpp::List & InList,
arma::uword col_nb,
arma::uword row_nb,
unsigned int threads_nb){
int n = InList.length();
// vector storing the memory address of each matrix of InList
std::vector<double*> InMatPtrVec(n);
// fill the std::vector
for(size_t i=0; i<InMatPtrVec.size(); ++i){
NumericMatrix m = InList[i];
InMatPtrVec[i] = &(m[0]);
}
// an writable arma::field<arma::mat> shared by all threads.
arma::field<arma::mat> OutField(n);
// multi-threaded section
#pragma omp parallel num_threads(threads_nb)
{
#pragma omp for
for(arma::uword i=0; i < n; ++i){
// Here arma::mat advanced constructor allows to re-use
// a R memory block without involving the R object owning
// this memory in the parallelized section.
const arma::mat InMat_i(InMatPtrVec[i], col_nb, row_nb, false, true);
// do something involving InMat_i (read only)
// ...
// A reference to OutField[i]
arma::mat & OutMat_i = OutField[i];
// do something involving OutMat_i (read/write).
// ...
} // end of parallelized for loop
} // end of multi-threaded section
// Return outputs as a R list
return(Rcpp::List::create(OutField, n);
}
This 2nd version using pointers and arma::mat advanced constructor avoid a deep copy of the R side list data.
This is how I call these functions in a R session:
# This list may contain hundreds of thousands of numeric matrix
inputList = list(...)
outputList1 = myRcppFunction1(inputList, threads_nb=4)
outputList2 = myRcppFunction2(inputList, threads_nb=4)
Both versions seem to do the job and gives the same results. I have repetitively tested it and have never encountered any segfault nor crashes nor stack imbalance warnings yet.
Moreover, I lastly did tests in gctorture(TRUE) context and both versions kept working as expected.
Is this a sufficient "proof" that both versions and especially the second one using std::vector<double*> are safe?