We can fill a range [first, last) using
std::mt19937 g;
std::uniform_real_distribution<> u;
std::generate(first, last, [&]() { return u(g); });
Theoretically, it would be more performant to execute std::generate with the execution policy std::execution::par_unseq. However, we could write:
std::generate(std::execution::par_unseq, first, last, [&]() { return u(g); });
but is this really safe? I think, parallel access to g might be problematic. If that's actually true, how can we fix this? I've seen some strange looking code like this:
std::generate(std::execution::par_unseq, first, last, []()
{
thread_local std::mt19937 g;
thread_local std::uniform_real_distribution<> u;
return u(g);
});
But is thread_local really sensible here? And how should g be seeded here if the generated samples are supposed to be independent?
Disclaimer
Note that documentation for
std::execution::par_unseqstates that "execution may be parallelized". Sopar_unseqdoes not necessarily parallelize anything and acts according to the underlying implementation (i.e., its behavior will be implementation specific). In fact, the GCC on my system (10.2.1) does not seem to have a parallelized implementation ofpar_unseqand only uses one thread.Given the above, and given that you say you want to perform the sampling multiple times, it's really just better/simpler to implement the parallel sampling yourself with OpenMP or similar alternatives.
Manual implementation with OpenMP
Note: compile with
-fopenmp.Original answer(s)
Yep, exactly, which is what the
thread_localdeclarations in your second code snippet are trying to avoid.Yes,
thread_localstorage duration is critical in this case from a both a performance and thread safety point of view. You could theoretically omitthread_localand create local variables each invocation, but as you may know generator initialization is an heavy operation that really only needs to be done once. Athread_localdefinition is a nice and simple way to address both thread safety and one-time initialization at once, as each thread local object will only be instantiated once per thread and live as long as the thread.std::mt19937should already solve this problem by itself provided that you seed all the generators differently.The only problem I can see in the second code snippet is that
gis always seeded (by the default constructor) with the same default seed, so you will end up with a bunch of copies of the same values (one copy per thread). You should use different seeds, and a good way of doing that could be using a random seed for each thread, as Marek R suggests in the comments above:Addressing further concerns:
This shouldn't be a problem, if you are able to write a function that returns the correct seed on each invocation, you can use it as follows:
Ok, this is a different story then, because in this case the above code would re-instantiate
gfor each thread that is spawned according topar_unseqevery time you re-sample the vector.std::generatewith astd::ececution`` policy is not really a good shortcut in your case. If you really want to stick to it, you will have to drop thethread_localand create a globalstaticstorage (such as a map) to cache generators between different sampling runs and retrieve them from within the lambda passed tostd::generate()`.Writing a simple manual implementation of the parallel sampling using OpenMP would be much easier to manage in your case. See above for the code.