Numbapro cuda python defining array in thread register in gpu

625 Views Asked by At

I know how to create a global device function inside Host using np.array or np.zeros or np.empty(shape, dtype) and then using cuda.to_device to copy.

Also, one can declare shared array as cuda.shared.array(shape, dtype)

But how to create an array of constant size in the register of a particular thread inside gpu function.

I tried cuda.device_array or np.array but nothing worked.

I simply want to do this inside a thread -

x = array(CONSTANT, int32) # should make x for each thread
1

There are 1 best solutions below

0
On BEST ANSWER

Numbapro supports numba.cuda.local.array(shape, type) for defining thread local arrays.

As with CUDA C, whether than array is defined in local memory or register is a compiler decision based on usage patterns of the array. If the indexing pattern of the local array is statically defined and there is sufficient register space, the compiler will use registers to store the array. Otherwise it will be stored in local memory. See this question and answer pair for more information.