Is there still shared mem bank conflict in nvidia cuda compute capability 7.0 and above?

637 Views Asked by At

If all threads in same block visit the same address i.e. array[0] for some old compute capability, there is a bank conflict. But does this conflict still exist for the latest compute capabilities (i.e. 7.0 for GPU V100 or 8.0 for A100)?

1

There are 1 best solutions below

0
huseyin tugrul buyukisik On

In this Nvidia blog compute capability 2.0 is said to have a multicast (and broadcast) feature which converts address collisions into single memory requests. Not all bank conflicts are caused by accesses to the same address but are caused by different addresses having the same result from the modulo calculation with the number of banks.

In your example, all threads accessing same address will do a broadcast operation. To generate a true bank conflict, you need to access multiple addresses like 0, stride, stride x2, stride x3, etc. such that there is no multicast but serialization on the same (shared) memory bank.

Volta architecture still has shared bank conflicts.

If shared memory has 32 banks, then it will have bank conflicts for 32 bit aligned nth, n+32nd, n+64th, ... addresses accessed at the same time. Unless they invent a dual-pipelined shared memory bank.