I am performing a series of calculations on a large number of threads using C++ AMP. The last step of the calculation though is to prune the result but only for a limited number of threads. For example, if the result of the calculation is below a threshold, then set the result to 0 BUT only do this for a maximum of X threads. Essentially this is a shared counter but also a shared conditional check.
Any help is appreciated!
My understanding of your question is the following pseudo-code performed by each thread:
I then further assume that both
global_threshold
andglobal_max
does not change during the computation (i.e. betweenparallel_for_each
start and finish) - so the most elegant way to pass them is through lambda capture.On the other hand,
global_counter
clearly changes value, so it must be located in modifiable memory shared across all threads, effectively beingarray<T,N>
orarray_view<T,N>
. Since the threads incrementing this object are not synchronized, the operation would need to be performed using atomic operation.The above translates to the following C++ AMP code (I'm using Visual Studio 2013 syntax, but it is easily back-portable to Visual Studio 2012):