I am currently trying to implement a very basic 2D convolution using CUDA cuDNN between an "image" of size 3x3 and a kernel of size 2x2, resulting in a 2x2 output.
This is my code:
// Create a cuDNN handle:
cudnnHandle_t handle;
cudnnCreate(&handle);
// Create your tensor descriptors:
cudnnTensorDescriptor_t cudnnIdesc;
cudnnFilterDescriptor_t cudnnFdesc;
cudnnTensorDescriptor_t cudnnOdesc;
cudnnConvolutionDescriptor_t cudnnConvDesc;
cudnnCreateTensorDescriptor( &cudnnIdesc );
cudnnCreateFilterDescriptor( &cudnnFdesc );
cudnnCreateTensorDescriptor( &cudnnOdesc );
cudnnCreateConvolutionDescriptor( &cudnnConvDesc );
// Set tensor dimensions as multiples of eight (only the input tensor is shown here):
// W, H, D, C, N
const int dimI[] = { I_M, I_N, 1, 1 };
// Wstride, Hstride, Dstride, Cstride, Nstride
const int strideI[] = { 1, 1, 1, 1 };
checkCUDAError( "SetImgDescriptor failed", cudnnSetTensorNdDescriptor(cudnnIdesc, CUDNN_DATA_HALF, 4, dimI, strideI) );
const int dimF[] = { K_M, K_N, 1, 1 };
checkCUDAError( "SetFilterDescriptor failed", cudnnSetFilterNdDescriptor(cudnnFdesc, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4, dimF) );
const int dimO[] = { I_M - K_M + 1, I_N - K_N + 1, 1, 1 };
const int strideO[] = { 1, 1, 1, 1 };
checkCUDAError( "SetOutDescriptor failed", cudnnSetTensorNdDescriptor(cudnnOdesc, CUDNN_DATA_HALF, 4, dimO, strideO) );
checkCUDAError( "SetConvDescriptor failed", cudnnSetConvolution2dDescriptor(cudnnConvDesc, 0, 0, 1, 1, 1, 1, CUDNN_CONVOLUTION, CUDNN_DATA_HALF) );
// Set the math type to allow cuDNN to use Tensor Cores:
checkCUDAError( "SetConvMathType failed", cudnnSetConvolutionMathType(cudnnConvDesc, CUDNN_TENSOR_OP_MATH) );
// Choose a supported algorithm:
int algoCount = 0;
cudnnConvolutionFwdAlgoPerf_t algoPerf;
checkCUDAError( "GetConvForwardAlgo failed", cudnnFindConvolutionForwardAlgorithm(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, 1, &algoCount, &algoPerf) );
// Allocate your workspace:
void *workSpace;
size_t workSpaceSize = 0;
checkCUDAError( "WorkspaceSize failed", cudnnGetConvolutionForwardWorkspaceSize(handle, cudnnIdesc, cudnnFdesc, cudnnConvDesc, cudnnOdesc, algoPerf.algo, &workSpaceSize) );
if (workSpaceSize > 0) {
cudaMalloc(&workSpace, workSpaceSize);
}
However, cudnnGetConvolutionForwardWorkspaceSize
fails with CUDNN_STATUS_BAD_PARAM
.
According to https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetConvolutionForwardWorkspaceSize
this can only be because of one of the reasons:
CUDNN_STATUS_BAD_PARAM:
At least one of the following conditions are met:
(1) One of the parameters handle, xDesc, wDesc, convDesc, yDesc is NULL.
(2) The tensor yDesc or wDesc are not of the same dimension as xDesc.
(3) The tensor xDesc, yDesc or wDesc are not of the same data type.
(4) The numbers of feature maps of the tensor xDesc and wDesc differ.
(5) The tensor xDesc has a dimension smaller than 3.
I don't see how any of them are true.
(1) is obviously not the case. Because yDesc, wDesc and xDesc all have 4 dimensions, (2) is also not the case.
Every tensor has the data type CUDNN_DATA_HALF
, which is why (3) is also not true.
I don't know exactly what (4) refers to but I think the number of feature maps for image and kernel is 1 in my case.
And (5) is also not true.
Any idea why the function fails nevertheless?
I solved the error by doing this: