Recently I started working with CUDA and I read an introductory book on the computing language. To see if I understood it well, I considered the following problem.
Consider a function minimize f(x,y) on the grid [-1,1] X [-1,1]. This provided me with a few practical questions and I would like to have your look on things.
Do I explicitly calculate the grid? If I create the grid on the CPU, then I'll have to transfer the information to the GPU. I can then use a 2D block layout and access data efficiently using texture memory. Is it then best to use square blocks or perhaps blocks of different shapes?
Suppose I don't explicitly make a grid. I can assign discretise the X and Y direction with constant float arrays (which provides fast memory access) and then use 1 list of blocks.
Thanks!
This was an interesting question for me because it represents a type of problem that I think is rare:
In other words, pretty much all compute, with not much dependence on data transfer, or even global memory usage/bandwidth.
Having said that, the question seems to be looking for a brute-force search approach to functional optimization/minimization, which is not an efficient technique for functions that are amenable to other optimization methods. But as a learning exercise, it's interesting (to me, anyway). It may also be useful for functions that are otherwise difficult to handle such as functions with discontinuities or other irregularities.
To answer your questions:
I wouldn't bother calculating the grid on the CPU. (I assume by "grid" you mean the functional value of
f
at each point on the grid.) First of all, this is a fairly computationally intensive task - which GPUs are good at, and secondly, it is potentially a large data set, so transferring it to the GPU (so the GPU can then do the search) will take time. I propose to let the GPU do this (compute the functional value at each grid point.) Since we won't be using global access to data for this, texture memory is not an issue.Yes, you could use a 1D array of blocks (list) or a 2D array. I don't think this significantly impacts the problem either way, and I think the 2D grid approach fits the problem better (and I think allows for slightly cleaner code) so I would suggest starting with a 2D array of blocks.
Here's a sample code that might be interesting to play with or crystallize ideas. Each thread has the responsibility to compute its respective value of
x
andy
, and then the functional valuef
at that point. Then a reduction followed by a block-draining reduction is used to search over all computed values for the minimum value (in this case).Notes:
OPT
andTST
macros, to perform a different kind of optimization - such as maximum instead of minimum.XNR
,XPR
, etc.