On device with compute capability <= 7.2 , I always use
nvprof --events shared_st_bank_conflict
but when i run it on RTX2080ti with CUDA10 , it returns
Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability greater than 7.2
So how can i detect whether there's share memory bank conflict on this devices ?
I've installed Nvidia Nsight Systems and Nsight Compute , find no such profiling report...
thks
As others pointed out, nvprof is replaced by Nsight Compute, check their metrics equivalence mapping.
In particular,
shared_efficiency
gets mapped tosmsp__sass_average_data_bytes_per_wavefront_mem_shared
(cryptic!).I have a feeling that more metrics suffered during this transition. Jokes aside, let's demonstrate how to use it. To this end, take a kernel intentionally causing bank conflicts:
This kernel should cause conflicts unless the offset is relatively prime with 32. Call the kernel:
Compile with
nvcc bank_conflicts.cu -o bank_conflicts
and we are ready to demonstrate conflicts detection as followsAs a bonus, let's establish the following fact: every bank is accessed k = GCD(offset,32) times, hence the efficiency reported equals 1/k. Why is that? A single int takes 32b = 4B fitting a single bank (the basic slice is 4B ); the thread x requests then the bank number
bank=(x * offset)%32
. This mapping takes every value exactlyk=GCD(offset,32)
times, seen for example by properties of linear transforms and elementary number theory :-)