How Does Cuda Interpret Stride Loops

31 Views Asked by At

I'm having trouble understanding how the stride loop actually works. For just generally iterating through arrays.

This is the example stride loop that I found. For a single block stride loop.

<<<1, 256>>>

__global__
void add(int n, float *x, float *y)
{
  int index = threadIdx.x;
  int stride = blockDim.x;
  for (int i = index; i < n; i += stride)
      y[i] = x[i] + y[i];
}

I'm guessing that it only runs the += stride once per block, and then the inner code per thread. But there is nothing that actually specifies that, since from normal c++ logic it would run the stride calculation every time the loop looped.

Or does it just run the looping logic for every single instruction/thread, since it seems like that would impact performance.

0

There are 0 best solutions below