I'm trying to use Callgrind/Kcachegrind for the first time to profile my C++ application and what I noticed is that the two functions that take more time are:
- < cycle 1 > (50% self) and
- do_lookup_x (15% self)
Now, from my understanding cycle 1 is related to the estimation of the time taken by recursively called functions, but it is not very clear to me how I should interpret a so high time spent here. If there are some cycles, I would like to see which function is called more often and take more CPU time at the end. If I disable Cycle Detection (View->Cycle Detection), then cycle 1 disappears but the "Self" time sum up to roughly 60%, and I'm not sure this is the best thing to do. Regarding do_lookup_x I'm totally clueless...
Can you clarify me a bit how should I interpret these results?
Thanks in advance.
Cycles may be detected incorrectly in KCachegrind: http://valgrind.org/docs/manual/cl-manual.html#cl-manual.cycles
Try to turn off Cycle Detection in KCachegrind's View menu and check "Self" time column, as "Incl" will be incorrect.
You can also try some other profiler with exact and full function stack saving. Many profilers supported by https://github.com/jrfonseca/gprof2dot script saves full stack, not only the callee-caller pairs as in callgrind/cachegrind format.