Issue with using Multi GPU NVIDIA

1k Views Asked by At

I'm learning how to use multi GPU for my CUDA application. I tried out a simple program which successfully ran on a system having two Tesla C2070. But when I tried to run the same program on a different system having a Tesla K40c and a Tesla C2070, it shows a segmentation fault. What might be the problem? I'm sure that there is no problem with the code. Is there any settings to be done in the environment? I have attached my code here for your reference.

#include <stdio.h>
#include "device_launch_parameters.h"
#include "cuda_runtime_api.h"

__global__ void testA(int *a)
{
   int i = blockIdx.x * blockDim.x + threadIdx.x;
   a[i] = a[i] * 2;
}

int main()
{
   int *ai, *bi, *ao, *bo;
   int iter;
   cudaStream_t streamA, streamB;
   cudaSetDevice(0);
   cudaStreamCreate(&streamA);
   cudaMalloc((void**)&ao, 10 * sizeof(int));
   cudaHostAlloc((void**)&ai, 10 * sizeof(int), cudaHostAllocMapped);
   for(iter=0; iter<10; iter++)
   {
       ai[iter] = iter+1;
   }

   cudaSetDevice(1);
   cudaStreamCreate(&streamB);
   cudaMalloc((void**)&bo, 10 * sizeof(int));
   cudaHostAlloc((void**)&bi, 10 * sizeof(int), cudaHostAllocMapped);
   for(iter=0; iter<10; iter++)
   {
       bi[iter] = iter+11;
   }

   cudaSetDevice(0);
   cudaMemcpyAsync(ao, ai, 10 * sizeof(int), cudaMemcpyHostToDevice, streamA);
   testA<<<1, 10, 0, streamA>>>(ao);
   cudaMemcpyAsync(ai, ao, 10 * sizeof(int), cudaMemcpyDeviceToHost, streamA);

   cudaSetDevice(1);
   cudaMemcpyAsync(bo, bi, 10 * sizeof(int), cudaMemcpyHostToDevice, streamB);
   testA<<<1, 10, 0, streamB>>>(bo);
   cudaMemcpyAsync(bi, bo, 10 * sizeof(int), cudaMemcpyDeviceToHost, streamB);

   cudaSetDevice(0);
   cudaStreamSynchronize(streamA);

   cudaSetDevice(1);
   cudaStreamSynchronize(streamB);

   printf("%d %d %d %d %d\n",ai[0],ai[1],ai[2],ai[3],ai[4]);
   printf("%d %d %d %d %d\n",bi[0],bi[1],bi[2],bi[3],bi[4]);
   return 0;
}

The segmentation fault occurs when bi array is initialized inside the for loop, which means the memory is not allocated for bi.

1

There are 1 best solutions below

2
On

With the new information you've provided based on the error checking, the problem you were having was due to the ECC error.

When a GPU has a double-bit ECC error detected in the current session, it is no longer usable for compute activities until either:

  1. the GPU is reset (e.g. via system reboot, or via driver unload/reload, or manually via nvidia-smi, etc.),

(or)

  1. ECC is disabled (which usually also may require a system reboot or gpu reset)

You can review ECC status of your GPUs with the nvidia-smi command. You probably already know which GPU was reporting the ECC error, since you disabled ECC, but in case not, based on your initial report it would be the one that was associated with the cudaSetDevice(1); command, which probably should have been the Tesla C2070 (i.e. not the K40).