Why is z always zero in CUDA kernel

803 Views Asked by At

I am using Cudafy to do some calculations on a NVIDIA GPU. (Quadro K1100M capability 3.0, if it matters)

My question is, when I use the following

cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8)).MyKernel...

why are my z indexes from the GThread instance always zero when I use this in my kernel?

int z = thread.blockIdx.z * thread.blockDim.z + thread.threadIdx.z;

Furthermore, if I have to do something like

cudaGpu.Launch(new dim3(44,8,num), new dim(8, 8, num)).MyKernel...

z does give different indexes as it should, but num can't be very large because of the restrictions on number of threads per block. Any surgestion on how to work around this?

Edit

Another way to phrase it. Can I use thread.z in my kernel (for anything useful) when block size is only 2D?

1

There are 1 best solutions below

13
Taro On BEST ANSWER

On all currently supported hardware, CUDA allows the use of both three dimensional grids and three dimensional blocks. On compute capability 1.x devices (which are no longer supported), grids were restricted to two dimensions.

However, CUDAfy currently uses a deprecated runtime API function to launch kernels, and silently uses only gridDim.x and gridDim.y, not taking gridDim.z in account :

_cuda.Launch(function, gridSize.x, gridSize.y);

As seen in the function DoLaunch() in CudaGPU.cs.

So while you can specify a three dimensional grid in CUDAfy, the third dimension is ignored during the kernel launch. Thanks to Florent for pointing this out !