How to get call graph profiling working with gcc compiled code and ARM Cortex A8 target?

1.9k Views Asked by At

I am biting my teeth out on this one...

I need to do profiling on an ARM board and need to view call graphs. I tried with OProfile, Kernel perf and Google performance tools. All work fine but do not output any call-graph information.

This led me to the conclusion that I am not compiling my code correctly.

I use the following flags when compiling my C++ code:

Arch specific:

-march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=vfpv3

General:

-fexceptions -fno-strict-aliasing -D_REENTRANT -Wall -Wextra

Debugging (with optimization):

-O2 -g -fno-omit-frame-pointer

I did a lot of Google searching and found some related topics:

  • libunwind ?
  • dwarf
  • (asynchronous-)unwind-tables
  • -mapcs-frame

However I do not fully understand how these are all connected. Any hints on how to get call graphs working?

Note (due to Rian's answer): I am interested in finding out if and why some methods take longer (in relation to others) on ARM than x86-64. It does not help to do this on a different platform (Even though my code compiles on both and I can do call-graphs on x86-64).

1

There are 1 best solutions below

1
On

I know you want to do your profiling on an ARM cortex-A8 but if you're interested in call-graphs, why not compile for x86 and run valgrind's callgrind tool and examine the results with kcachegrind?

The call graphs should be the same between the two architectures, even if they compile the functions slightly differently, the relationship between functions shouldn't change.

No special flags needed:

valgrind --tool=callgrind -v --dump-every-bb=10000000 ./some-app
kcachegrind &