Shared memory bank conflict in CUDA Fortran when loading 2D data from global memory

252 Views Asked by At

I am accessing global memory to load data to shared memory and would like to know if there is a bank conflict. Here is the setup:

In global memory: g_array. A 2D matrix of size (256, 64)

This is how I load the array data from global memory to shared memory. I called the kernel with gridDim (4, 1) and blockDim (16, 16).

d_j = (blockIdx%x-1) * blockDim%x + threadIdx%x-1
d_l = (blockIdx%y-1) * blockDim%y + threadIdx%y-1
tIdx = threadIdx%x -1 
tIdy = threadIdx%y -1

real, shared :: s_array(0:15,0:15)

s_array(tIdx,tIdy) = g_array(d_j,d_l)
doSomthingwithMySharedMemoryData()
.....
1

There are 1 best solutions below

0
On

I haven't actually run your code, and my fortran is not as good as my c/c++, but I believe generally speaking your code should coalesce well (on global memory accesses) and not have bank conflicts (on shared mem accesses).

The important factor is that you have matched the threadIdx%x index with the rapidly-varying matrix subscript, which in fortran is the first index (since fortran is stored in column-major order) whereas in c/c++ it is the second (or last) index (since c/c++ matrices are stored in row-major order).

Since you're not doing anything else with the subsripts other than using the thread indices directly, there should be no issue.

In general, with accesses like this, the same rules you use to achieve global memory coalesced access will also allow you to avoid bank conflicts on shared memory.