As my GPU device Quadro FX 3700 doesn't support arch>sm_11. I was not able to use relocatable device code (rdc). Hence i combined all the utilities needed into 1 large file (say x.cu). To give a overview of x.cu it contains 2 classes with 5 member functions each, 20 device functions, 1 global kernel, 1 kernel caller function.
Now, when i try to compile via Nsight it just hangs showing Build% as 3. When i try compiling using
nvcc x.cu -o output -I"."
It shows the following Messages and compiles after a long time,
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: Olimit was exceeded on function _Z18optimalOrderKernelPdP18PrepositioningCUDAdi; will not perform function-scope optimization.
To still perform function-scope optimization, use -OPT:Olimit=0 (no limit) or -OPT:Olimit=45022
/tmp/tmpxft_0000236a_00000000-9_Kernel.cpp3.i(0): Warning: To override Olimit for all functions in file, use -OPT:Olimit=45022
(Compiler may run out of memory or run very slowly for large Olimit values)
Where optimalOrderKernel is the global kernel. As compiling shouldn't be taking much time. I want to understand the reason behind this messages, particularly Olimit.
Olimit
is pretty clear, I think. It is a limit the compiler places on the amount of effort it will expend on optimizing code.Most codes compile just fine using
nvcc
. However, no compiler is perfect, and some seemingly innocuous codes can cause the compiler to spend a long time at an optimization process that would normally be quick.Since you haven't provided any code, I'm speaking in generalities.
Since there is the occasional case where the compiler spends a disproportionately long time in certain optimization phases, the
Olimit
provides a convenient watchdog, so you have some idea of why it is taking so long. Furthermore, theOlimit
acts like a watchdog on an optimization process that is taking too long. When it is exceeded, certain optimization steps are aborted, and a "less optimized" version of your code is generated, instead.I think the compiler messages you received are quite clear on how to modify the
Olimit
depending on your intentions. You can override it to increase the watchdog period, or disable it entirely (by setting it to zero). In that case, the compile process could take an arbitrarily long period of time, and/or run out of memory, as the messages indicate.