I am doing a detailed code analysis for which I want to measure the total number of bank conflicts per warp.
The nvvp
documentation lists this metric, which was the only one I could find related to bank conflicts:
shared_replay_overhead: Average number of replays due to shared memory conflicts for each instruction executed
When I profile the metric using nvprof
(or nvvp
) I get a result like this:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K20m (0)"
Kernel: void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
301 shared_replay_overhead Shared Memory Replay Overhead 0.089730 0.089730 0.089730
I need to utilize this value 0.089730
or devise some other method to arrive at a measurement of number of bank conflicts.
I understand that this value is the 'average' taken across all the warps that are executing. If I had to measure the total number of bank conflicts per warp, is there a way to do it using the nvprof
results?
Possible approaches that came to my mind:
- By using
shared_replay_overhead
results and using them in a formula to calculate the number of bank conflicts. I am guessing I have to apply some sort of formula likeshared_replay_overhead * Total number of warps launched
where I know theTotal number of warps launched
in advance, but I can't figure out what. - By first detecting that it's a four-way bank conflict, eight-way bank conflict, etc, and then multiplying
4
/8
by the number of times the shared memory operation takes place (how to measure that?).
This probably requires a fairly good technical knowledge about the GPU architecture as well, in addition to nvprof
results, which I don't think I have yet. For the record, my GPU is of Kepler architecture, SM 3.5.
Even if I can measure the number of bank conflicts per block instead of per warp, it will suffice. After that I can do the necessary calculations to get the value on a per-warp basis.
I think you should look at CUPTI (Cuda Profiling Tools Interface) documentation. There are also few examples with your CUDA SDK inside
/extras/CUPTI
directory. I'm not very familiar with this library, but It looks like you can write your own profiler, and measure what you want, or collect metrics you're interested in. It will be low level, but this is what you need to get precise answer.