Seeding for multithreaded unif_rand()

682 Views Asked by At

I want to seed R's internal unif_rand() in a multithreaded environment. The code below generates a 2-column matrix of uniform random numbers within 2 threads. The results are interesting.

struct mtRunif: public RcppParallel::Worker
{
  int Nrow; // number of rows in matrix.
  double *v; // point to the 0th element of the 0th column.
  void operator() (std::size_t st, std::size_t end)
  {
    // st = 0 in the 0th thread, 1 in the 1st thread. 
    double *vst = v + st * Nrow;
    for(int i = 0; i < Nrow; ++i)
    {
      vst[i] = unif_rand();
    }
  }


  mtRunif(int Nrow, double *v): Nrow(Nrow), v(v)
  {
    RcppParallel::parallelFor(0, 2, *this);
  }
};


// [[Rcpp::export]] 
NumericMatrix testSeeding(int sampleSize)
{
  NumericMatrix rst(sampleSize, 2);
  mtRunif(sampleSize, &*rst.begin());
  return rst;
}


/***R
N = 100
set.seed(42); tmp = testSeeding(N) 
set.seed(42); tmp2 = testSeeding(N) 
# see if sequences are identical
range(tmp[, 1] - tmp2[, 1]); range(tmp[, 2] - tmp2[, 2])
# [1] 0 0
# [1] 0 0


N = 1000
set.seed(42); tmp = testSeeding(N) 
set.seed(42); tmp2 = testSeeding(N) 
range(tmp[, 1] - tmp2[, 1]); range(tmp[, 2] - tmp2[, 2])
# [1] -0.9655154  0.8989870
# [1] -0.969356  0.963239
*/

The results suggest set.seed() controls the randomness in all threads for small sample sizes? Initially I expected set.seed() would be effective in no more than 1 thread. I do not want to exploit the conclusion because it could be absolutely wrong. On the other hand, is there a seeding function for unif_rand() akin to std::srand() for std::rand()?

Thank you!

2

There are 2 best solutions below

2
On BEST ANSWER

After advertising dqrng in the comments I realized that I had not written any documentation on how to use the RNGs from that package for parallel usage. So I started a new vignette, that will be part of the next release. Here one of the examples, which is quite similar to what you were trying to do:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <pcg_random.hpp>
#include <dqrng_distribution.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
// [[Rcpp::plugins(cpp11)]]

struct RandomFill : public RcppParallel::Worker {
  RcppParallel::RMatrix<double> output;
  uint64_t seed;
  dqrng::normal_distribution dist{0.0, 1.0};

  RandomFill(Rcpp::NumericMatrix output, const uint64_t seed) : output(output), seed(seed) {};

  void operator()(std::size_t begin, std::size_t end) {
    pcg64 rng(seed, end); // ctor with seed and stream id
    auto gen = std::bind(dist, rng);
    std::generate(output.begin() + begin * output.nrow(),
                  output.begin() + end * output.nrow(),
                  std::ref(gen));
  }
};

// [[Rcpp::export]]
Rcpp::NumericMatrix parallel_random_matrix(const int n, const int m, const int ncores) {
  Rcpp::NumericMatrix res(n, m);
  RandomFill randomFill(res, 42);
  RcppParallel::parallelFor(0, m, randomFill, m/ncores + 1);
  return res;
}

/*** R
res <- parallel_random_matrix(1e6, 8, 4)
head(res)
*/

Result:

> res <- parallel_random_matrix(1e6, 8, 4)

> head(res)
           [,1]        [,2]        [,3]       [,4]       [,5]       [,6]       [,7]       [,8]
[1,]  0.7114429 -0.19759808 -0.47149983  0.6046378 -0.3709571 -0.8089533  0.8185977 0.49010575
[2,]  0.8721661 -0.47654248  1.10411136 -1.6290995 -1.3276661 -0.2585322 -1.2437521 0.90325167
[3,] -1.4959624  0.61068373 -0.54343828 -0.4623555 -1.1779352 -2.8068283 -0.4341252 1.74490995
[4,]  0.5087201 -0.05175746  0.19007581 -0.7869679  0.9672267 -0.5009787 -0.5283977 1.42487290
[5,] -0.8191448 -0.77348120 -0.03458304  0.7243224  1.0594094 -0.6951184 -0.5456669 0.00894037
[6,]  1.2289518 -2.33539762  0.40222707 -2.3346460 -0.5796549 -0.3092356  2.8961294 0.16773085

BTW, please do not sue std::rand(). If you want to use the standard library, then please use something like std::mt19937 from random with C++11.

2
On

In short: you can't do this with R for R-internal reasons, and that has been documented extensively.

There are also statistical issues about RNGs and streams. So you most likely want to look into "streaming RNGs" suitable for drawing from multiple threads. There are some on CRAN

as well as the older sprng that is no longer on CRAN.