I want to compare the maximum absolute difference between two three-dimensional arrays with C++ AMP.
With OpenMP it is easy. Considering 2 arrays
float*** u, ***uOld;
The code is:
double residual = 0;
#pragma omp parallel for schedule(static) collapse(3) reduction(max : residual)
for (int i = 0; i < nbX; i++)
for (int j = 0; j < nbY; j++)
for (int k = 0; k < nbTheta; k++)
residual = std::max(residual, fabs(u[i][j][k] - uOld[i][j][k]));
It would be easy to use max_element from AMP Algorithms, but it is not implemented. I think of something like this, but a reduction is needed at the outer loop level:
extent<1> extTheta(nbTheta);
parallel_for_each(extTheta, [=, &u_, &uOld_](index<1> iTheta) restrict(amp)
{
type residual = 0;
for (int iX = 0; iX < nbX; iX++)
for (int iY = 0; iY < nbY; iY++)
residual = fast_math::fmax(residual, fast_math::fabs(u_[iX][iY][iTheta] - uOld_[iX][iY][iTheta]));
})
The data is on the GPU, and I do not want it to transit on the GPU for efficiency reasons. How to do it efficiently?
Here is a solution, inspired from msdn blog: https://blogs.msdn.microsoft.com/nativeconcurrency/2012/03/08/parallel-reduction-using-c-amp
Unlike the snippet in the question, the solution includes the update of uOld to U.
The reduction is not the most efficient, but is still fast compared to the rest of the code.