So I'm writing a neural network library using Aparapi (which generates OpenCL from Java code). Anyway there are many situations where I need to do complex index operations to find the source/destination node for a given weight when doing forward passes and backpropagation.
In many cases this is very simple 1D to 2D formula, but in some cases, such as for convolution nets, I need to do a somewhat more complex operation to find the index (often something like 3D to 1D to 3D).
I have been sticking with algorthims to compute these indices. The alternative would be to simple store the source and destination indices for each weight in a constant int array. I have avoided this as this would almost double the amount of memory storage.
I was wondering what the speed differences would be for computing indices vs reading them from a constant array? Am I losing speed in exchange for memory? Is the difference significant?
Computation is almost always faster on the GPU than global memory access to do the same thing (like a look-up-table). In particular, because the GPU keeps so many kernels "in flight" the math happens while it is waiting on the I/O from the previous kernel slot. So if your math is not too complex, prefer to do it rather than burn a global memory access.