I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory is 64KB. But when is this memory actually allocated on the device? Is this limit apply to each kernel, cuda context or for the whole application?
Let's say there are several kernels in a .cu
file, each using less than 64K constant memory. But the total constant memory usage is more than 64K. Is it possible to call these kernels sequentially? What happens if they are called concurrently using different streams?
What happens if there is a large CUDA dynamic library with lots of kernels each using different amounts of constant memory?
What happens if there are two applications each requiring more than half of the available constant memory? The first application runs fine, but when will the second app fail? At app start, at cudaMemcpyToSymbol()
calls or at kernel execution?
Parallel Thread Execution ISA Version 3.1 section 5.1.3 discusses constant banks.
A simple program can be used to illustrate the use of constant memory.
Compile this for compute_30, sm_30 and execute
cuobjdump -sass <executable or obj>
to disassemble you should seeI annotated to the right of the SASS.
On sm30 you can see that parameters are passed in constant bank 0 starting at offset 0x140.
User defined
__constant__
variables are defined in constant bank 3.If you execute
cuobjdump --dump-elf <executable or obj>
you can find other interesting constant information.The kernel parameter constant bank is versioned per launch so that concurrent kernels can be executed. The compiler and user constants are per CUmodule. It is the responsibility of the developer to manage coherency of this data. For example, the developer has to ensure that a cudaMemcpyToSymbol is update in a safe manner.