I am using Cudafy to do some calculations on a NVIDIA GPU. (Quadro K1100M capability 3.0, if it matters)
My question is, when I use the following
cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8)).MyKernel...
why are my z indexes from the GThread instance always zero when I use this in my kernel?
int z = thread.blockIdx.z * thread.blockDim.z + thread.threadIdx.z;
Furthermore, if I have to do something like
cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8, num)).MyKernel...
z does give different indexes as it should, but num can't be very large because of the restrictions on number of threads per block. Any surgestion on how to work around this?
Edit
Another way to phrase it. Can I use thread.z in my kernel (for anything useful) when block size is only 2D?
On all currently supported hardware, CUDA allows the use of both three dimensional grids and three dimensional blocks. On compute capability 1.x devices (which are no longer supported), grids were restricted to two dimensions.
However, CUDAfy currently uses a deprecated runtime API function to launch kernels, and silently uses only gridDim.x and gridDim.y, not taking gridDim.z in account :
As seen in the function DoLaunch() in CudaGPU.cs.
So while you can specify a three dimensional grid in CUDAfy, the third dimension is ignored during the kernel launch. Thanks to Florent for pointing this out !