nvvp and nsight's profiler give a different result?

643 Views Asked by At

I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation happen?

The experiment was tried on Linux, CUDA 5.0, GTX 580. The program is only copying data from one array to another in kernel function: In main:

cudaMalloc((void**)&dev_a, NUM * sizeof(float));
cudaMalloc((void**)&dev_b, NUM * sizeof(float));
kernel<<<grid,block>>>((uint4 *)dev_a, (uint4 *)dev_b);

the kernel:

__global__ void kernel(uint4 *a, uint4 *b){
        unsigned int id = blockIdx.x * THREAD_NUM + threadIdx.x;
        for(unsigned int i = 0;i < LOOP/4;i++){
                b[id + i * GRID_NUM * THREAD_NUM] = a[id + i * GRID_NUM * THREAD_NUM];
        }
        return;
1

There are 1 best solutions below

0
On

Profiler in Nsight EE and standalone Visual Profiler on Linux are based on a same codebase. Please make sure:

  1. You are using same executable.
  2. There is no difference in environment variable values (e.g. LD_LIIBRARY_PATH).

Please note that Nsight EE launch UI may be slightly confusing. When you click "Profile" after debugging the debug build, it may actually run profiling on debug executable trying to keep all the custom launch settings (e.g. command line arguments, working folder, etc.) you could have setup. From the main menu click Run->Profile Configurations... to see the settings Nsight uses when profiling your application.