How can a registers-only instruction stall due to "memory dependencies"?

366 Views Asked by At

I am profiling CUDA kernel using nvprof with PC sampling enabled, as to understand some latency problems I am having. The GPU I am using is the P100 (compute 6.0)

PC sampling reports that a DFMA is stalling frequently due to memory dependencies. The SASS code for the DFMA is as follows:

 DFMA R22, R4, R8, R22

My take of the problem is that R8 needs to be loaded via an LDG.E.CI.64 with a very high miss rate on L2.

The definition of a memory dependency stall is "A load/store cannot be made because the required resources are not available or are fully utilized, or too many requests of a given type are outstanding."

What confuses me is that DFMA are not load/store operations and if I am right that the stall is due to data not available on R8, then it should be an execution dependency. What does a memory dependency stall on a DFMA means?

0

There are 0 best solutions below