CUDA architecture -sm_11 compile issue in NSight

347 Views Asked by At

As my GPU device Quadro FX 3700 doesn't support arch>sm_11. I was not able to use relocatable device code (rdc). Hence i combined all the utilities needed into 1 large file (say x.cu). To give a overview of x.cu it contains 2 classes with 5 member functions each, 20 device functions, 1 global kernel, 1 kernel caller function.

Now, when i try to compile via Nsight it just hangs showing Build% as 3. When i try compiling using

nvcc x.cu -o output -I"."

It shows the following Messages and compiles after a long time,

/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: Olimit was exceeded on function _Z18optimalOrderKernelPdP18PrepositioningCUDAdi; will not perform function-scope optimization.
    To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=45022
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: To override Olimit for all functions in file, use -OPT:Olimit=45022
    (Compiler may run out of memory or run very slowly for large Olimit values)

Where optimalOrderKernel is the global kernel. As compiling shouldn't be taking much time. I want to understand the reason behind this messages, particularly Olimit.

1

There are 1 best solutions below

0
On

Olimit is pretty clear, I think. It is a limit the compiler places on the amount of effort it will expend on optimizing code.

Most codes compile just fine using nvcc. However, no compiler is perfect, and some seemingly innocuous codes can cause the compiler to spend a long time at an optimization process that would normally be quick.

Since you haven't provided any code, I'm speaking in generalities.

Since there is the occasional case where the compiler spends a disproportionately long time in certain optimization phases, the Olimit provides a convenient watchdog, so you have some idea of why it is taking so long. Furthermore, the Olimit acts like a watchdog on an optimization process that is taking too long. When it is exceeded, certain optimization steps are aborted, and a "less optimized" version of your code is generated, instead.

I think the compiler messages you received are quite clear on how to modify the Olimit depending on your intentions. You can override it to increase the watchdog period, or disable it entirely (by setting it to zero). In that case, the compile process could take an arbitrarily long period of time, and/or run out of memory, as the messages indicate.