AMP C++ compute maximum value over an array

391 Views Asked by seb007 At 13 January 2017 at 14:27

I want to compare the maximum absolute difference between two three-dimensional arrays with C++ AMP.

With OpenMP it is easy. Considering 2 arrays

float*** u, ***uOld;

The code is:

double residual = 0;
#pragma omp parallel for schedule(static) collapse(3) reduction(max : residual)
for (int i = 0; i < nbX; i++)
    for (int j = 0; j < nbY; j++)
        for (int k = 0; k < nbTheta; k++)
            residual = std::max(residual, fabs(u[i][j][k] - uOld[i][j][k]));

It would be easy to use max_element from AMP Algorithms, but it is not implemented. I think of something like this, but a reduction is needed at the outer loop level:

extent<1> extTheta(nbTheta);
parallel_for_each(extTheta, [=, &u_, &uOld_](index<1> iTheta) restrict(amp)
{
    type residual = 0;
    for (int iX = 0; iX < nbX; iX++)
    for (int iY = 0; iY < nbY; iY++)
    residual = fast_math::fmax(residual, fast_math::fabs(u_[iX][iY][iTheta] - uOld_[iX][iY][iTheta]));
})

The data is on the GPU, and I do not want it to transit on the GPU for efficiency reasons. How to do it efficiently?

Original Q&A

There are 1 best solutions below

seb007 On 13 January 2017 at 18:12

Here is a solution, inspired from msdn blog: https://blogs.msdn.microsoft.com/nativeconcurrency/2012/03/08/parallel-reduction-using-c-amp

parallel_for_each(extent<3>(nbTheta, nbX, nbY), [=, &u_, &uOld_](index<3> idx) restrict(amp)
{
    uOld_[idx[0]][idx[1]][idx[2]] = abs1(u_[idx[0]][idx[1]][idx[2]] - uOld_[idx[0]][idx[1]][idx[2]]);
});


array_view<float, 1> residualReduce_ =  uOld_.view_as<1>(extent<1>(nbTheta*nbX*nbY));
array_view<float, 1> residual_ = residualReduce_.section(index<1>(0), extent<1>(1));
for (unsigned shift = nbTheta*nbX*nbY / 2; shift > 0; shift /= 2)
{
    parallel_for_each(extent<1>(shift), [=](index<1> idx) restrict(amp)
    {
        residualReduce_[idx[0]] = fast_math::fmax(residualReduce_[idx[0]], residualReduce_[idx[0] + shift]);
        if (shift % 2){ //If odd, each thread includes a shifted entry. One will match the end of the queue
            residualReduce_[idx[0]] = fast_math::fmax(residualReduce_[idx[0]], residualReduce_[idx[0] + shift + 1]);
        }
    });
}
concurrency::copy(residual_, &residual);
parallel_for_each(extent<3>(nbTheta, nbX, nbY), [=, &u_, &uOld_](index<3> idx) restrict(amp)
{
    uOld_[idx[0]][idx[1]][idx[2]] = u_[idx[0]][idx[1]][idx[2]];
})

Unlike the snippet in the question, the solution includes the update of uOld to U.

The reduction is not the most efficient, but is still fast compared to the rest of the code.

AMP C++ compute maximum value over an array

There are 1 best solutions below

Related Questions in MULTITHREADING

Related Questions in PARALLEL-PROCESSING

Related Questions in FOLD

Related Questions in C++-AMP

Trending Questions

Popular # Hahtags

Popular Questions