cudaMemCpy returning cudaInvalidData

806 Views Asked by At

I have written a method that is called from a .cpp file for the purpose of running cudaMemcpy. The method is below:

void copy_to_device(uint32_t *host, uint32_t *device, int size)
{
    cudaError_t ret; 
    ret = cudaMemcpy(device, host, size*sizeof(uint32_t), cudaMemcpyHostToDevice); 

    if(ret == cudaErrorInvalidValue)
        printf("1!\n"); 
    else if(ret == cudaErrorInvalidDevicePointer)
        printf("2!\n"); 
    else if(ret == cudaErrorInvalidMemcpyDirection)
        printf("3!\n"); 
}

my .cpp file calls it like this:

uint32_t *input_device;
device_malloc(input_device, INPUT_HEIGHT*INPUT_WIDTH);
uint32_t  *oneDinput = TwoDtoOneD(input, INPUT_HEIGHT, INPUT_WIDTH); 
copy_to_device(oneDinput, input_device, INPUT_HEIGHT*INPUT_WIDTH);

All that TwoDtoOneD does is take in a 2D array and convert it to a 1D array and return it. Whenever I try and use copy_to_device method, it returns cudaErrorInvalidValue which isn't well documented on NVIDIA's website. Do you guys happen to know what is wrong with the parameters I am passing to my function that is causing this error? It's causing issues down the road during kernel execution. If you need any more details, please ask.

Here's the method device_malloc:

void device_malloc(uint32_t *buffer, int size)
{
    cudaMalloc((void **) &buffer, size*sizeof(uint32_t)); 
}
1

There are 1 best solutions below

2
On BEST ANSWER

The problem is here:

uint32_t *input_device;
device_malloc(input_device, INPUT_HEIGHT*INPUT_WIDTH);

Whatever device_malloc does, it does not modify the input_device value. That is, unless the first argument is a reference to pointer, but I am ready to bet it is not.

You need to change the first argument of device_malloc to a pointer to pointer, and call it like that:

device_malloc(&input_device, INPUT_HEIGHT*INPUT_WIDTH);

Or just have device_malloc return a pointer to the allocated memory.

To answer your question more directly, cudaMemcpy returns an error because its first argument, device, is not a valid device pointer, which CUDA runtime has a way of checking. It probably holds garbage value since you never initialize it due to the above issue.

As a side note and unrelated to the issue, you may want to use cudaGetErrorString funciton for a more convenient way to print out the status.