cudaMemcpy() gives segfault when using Type**

539 Views Asked by At

I want to copy a double pointer object to the host and compute over it on the GPU Device. When doing cudaMemcpy of the object to device it throws SEGFAULT.

BMP Input;
Input.ReadFromFile( fileName );
WIDTH = Input.TellWidth();
HEIGHT = Input.TellHeight();
RGBApixel** imageData = new RGBApixel* [HEIGHT];
for (int i = 0; i < HEIGHT; i++) 
    imageData[i] = new RGBApixel [WIDTH];

for(int j=0;j<Input.TellHeight();j++){
    for(int i=0;i<Input.TellWidth();i++){
      imageData[j][i] = Input.GetPixel(i,j);
    }
  }
long long imageSize = WIDTH*HEIGHT*sizeof(RGBApixel *);

RGBApixel** dev_imgdata,dev_imgdata_out;
//Allocating cudaMemory
cudaMalloc( (void **) &dev_imgdata, imageSize );
cudaMalloc( (void **) &dev_imgdata_out, imageSize );

Now the below line throws segfault

cudaMemcpy(dev_imgdata,imageData,imageSize,cudaMemcpyHostToDevice);
2

There are 2 best solutions below

0
On

When declaring RGBApixel** imageData = new RGBApixel* [HEIGHT]; you have absolutely no guarantee that imageData will occupy a contiguous block of memory.

cudaMemcpy copies contiguous blocks of memory into the device RAM. Your statement tries to copy the start addresses of each matrix row but not the actual data. Also when using cudaMalloc, you need to properly allocate for each line, exactly as you did for the host buffer.

What you need to do is to declare imageData as just a RGMAPixel* - basically put the matrix in a single vector and use proper indexing and it will work.

You can also copy each line at a time but that's not a very good practice since every memory access will require an extra indirection and you will mess the caching efficiency.

0
On

Also, make sure that when you compile your program, you use -arch sm_20 to enable extra options for your graphic card ( if it has Capability 2.0). Without it I believe you can't use double and the result is unpredictable (or the double is diminished to float)