I am a physicist, writing a program that involves generating several (order of a few billions) random numbers, drawn from a Gaussian distribution. I am trying to use C++11. The generation of these random numbers is separated by an operation that should take very little time. My biggest worry is if the fact that I am generating so many random numbers, with such a little time gap, could potentially lead to sub-optimal performance. I am testing certain statistical properties, which rely heavily on the independence of the randomness of the numbers, so, my result is particularly sensitive to these issues. My question is, with the kinds of numbers I mention below in the code (a simplified version of my actual code), am I doing something obviously (or even, subtly) wrong?
#include <random>
// Several other includes, etc.
int main () {
int dim_vec(400), nStats(1e8);
vector<double> vec1(dim_vec), vec2(dim_vec);
// Initialize the above vectors, which are order 1 numbers.
random_device rd;
mt19937 generator(rd());
double y(0.0);
double l(0.0);
for (int i(0);i<nStats;i++)
{
for (int j(0);j<dim_vec;j++)
{
normal_distribution<double> distribution(0.0,1/sqrt(vec1[j]));
l=distribution(generator);
y+=l*vec2[j];
}
cout << y << endl;
y=0.0;
}
}
The
normal_distribution
is allowed to have state. And with this particular distribution, it is common to generate numbers in pairs with every other call, and on the odd calls, return the second cached number. By constructing a new distribution on each call you are throwing away that cache.Fortunately you can "shape" a single distribution by calling with different normal_distribution::param_type's:
I'm not familiar with all implementations of
std::normal_distribution
. However I wrote the one for libc++. So I can tell you with some amount of certainty that my slight rewrite of your code will have a positive performance impact. I am unsure what impact it will have on the quality, except to say that I know it won't degrade it.Update
Regarding Severin Pappadeux's comment below about the legality of generating pairs of numbers at a time within a distribution: See N1452 where this very technique is discussed and allowed for: