I wrote a piece of Rcpp code to speed up some treatment involving a R list of numeric matrices. As the number of matrix within the R list may reach hundreds of thousands, I started parallelizing this Rcpp code using openMP.

But as explained multiple times in many posts here, Rcpp objects (e.g. Rcpp::List and Rcpp::NumericMatrix) cannot be safely involved in multi-threaded sections, as they are subject to R's memory management and its garbage collector.

So I made and tested two versions of the multi-threaded Rcpp function:

i) - One uses a const arma::field<arma::mat> as function 1st parameter and using it as is in the parallelized omp section.

Rcpp::List myRcppFunction1(const arma::field<arma::mat> & InField,
                           unsigned int threads_nb);

However, this implies deep copy of the data when the R-side list is implicitly converted to arma::fieldarma::mat.

ii) - The other uses a Rcpp::List as 1st parameter, and defines a std::vector<double*> whose elements point to the matrix addresses inside the Rcpp::List. Only this std::vector is involved in the parallelized section, as following:

Rcpp::List myRcppFunction2(const Rcpp::List & InList,
                           arma::uword col_nb,
                           arma::uword row_nb,
                           unsigned int threads_nb){
  int n = InList.length();

  // vector storing the memory address of each matrix of InList
  std::vector<double*> InMatPtrVec(n);

  // fill the std::vector
  for(size_t i=0; i<InMatPtrVec.size(); ++i){
    NumericMatrix m = InList[i];
    InMatPtrVec[i] = &(m[0]);
  }
  // an writable arma::field<arma::mat> shared by all threads. 
  arma::field<arma::mat> OutField(n);

  // multi-threaded section
  #pragma omp parallel num_threads(threads_nb)
  {
    #pragma omp for
    for(arma::uword i=0; i < n; ++i){

      // Here arma::mat advanced constructor allows to re-use
      // a R memory block without involving the R object owning
      // this memory in the parallelized section.
      const arma::mat InMat_i(InMatPtrVec[i], col_nb, row_nb, false, true);

      // do something involving InMat_i (read only)
      // ...

      // A reference to OutField[i]
      arma::mat & OutMat_i = OutField[i];
      // do something involving OutMat_i (read/write).
      // ...

    } // end of parallelized for loop

  } // end of multi-threaded section

  // Return outputs as a R list
  return(Rcpp::List::create(OutField, n);
}

This 2nd version using pointers and arma::mat advanced constructor avoid a deep copy of the R side list data.

This is how I call these functions in a R session:

# This list may contain hundreds of thousands of numeric matrix
inputList = list(...)
outputList1 = myRcppFunction1(inputList, threads_nb=4)
outputList2 = myRcppFunction2(inputList, threads_nb=4)

Both versions seem to do the job and gives the same results. I have repetitively tested it and have never encountered any segfault nor crashes nor stack imbalance warnings yet.

Moreover, I lastly did tests in gctorture(TRUE) context and both versions kept working as expected.

Is this a sufficient "proof" that both versions and especially the second one using std::vector<double*> are safe?

0

There are 0 best solutions below