Using a class member of type 'array'

120 Views Asked by At

I know this is a simple question, but I have not been able to find the answer.

I want a C++ class that manages a large block of memory, where the memory is regularly processed in the GPU when a certain class method is called. The class constructor is passed the size of the array, and after construction the array size never changes. The method that does the parallel_for_each should not waste processor cycles or memory when its not necessary.

How do I do this?

I can't create a concurrency::array as a class member, because I need to know how big the array will be before it is created. I can't have a member that is a pointer to a concurrency::array (and then allocate it with 'new' in, for example, the constructor), because I can't figure out how to specify it to the parallel_for_each.

On a side note, I don't normally need to copy the array between the GPU and host, but its fine if for some reason I have to do that, as long as its not done regularly. Otherwise it would waste processor cycles and memory according to the size of the array.

Here's an example of something like what I want. Of course, the reference/pointer captured by the parallel_for_each is wrong. (This is not checked for syntax):

class MyClass
{
    int* myHostArrayPtr;
    concurrency::array<int,1>* myGpuArrayPtr;

    MyClass(int size)
    {
        myHostArrayPtr = new int(size);

        memset(myHostArrayPtr,0,size * sizeof(int));

        myGpuArrayPtr = new concurrency::array<int,1>(size,myHostArrayPtr);
    }

    void ProcessInGpu()
    {
        parallel_for_each(
            myGpuArrayPtr->extent,
            [&myGpuArrayPtr](index<1> i) restrict(amp)
            {
                myGpuArray[i]+=14;
            }
        );
    }
};
2

There are 2 best solutions below

0
On

OK, I think I figured it out. One must put the parallel_for_each in a function that takes references to the array objects, and then they can be passed by reference to the parallel_for_each. To wit:

void MyClass::Process(concurrency::array<int,1>& myGpuArray){
    parallel_for_each(
        myGpuArray.extent,
        [&myGpuArray](index<1> i) restrict(amp)
        {
            myGpuArray[i]+=14;
        }
   );
}

This is interesting because its really a work around to a C++ shortcoming, that you can't refer to a pointed-to variable as a reference without the above function call work-around (I think?). (That is, and not call the copy-constructor).

EDIT:

Yep, the above works. I benchmarked it and its just as fast as the code that uses a local array. Also, I tested it by converting a pointer to a reference in the call, and that worked, too. So it will work with dynamically allocated arrays.

1
On

I think, you need templates here:

template <std::size_t N> class MyClass {
    concurrency::array<int,N> myGpuArray;
    ...
}

int main () {
    MyClass<10> someName;
    ...
}