I want to measure specific parts of my code to understand how well they perform. I already found the following:
How to get the CPU cycle count in x86_64 from C++?
...which allows me to easily calculate the exact amount of CPU cycles spent between two calls to rdtsc.
What I'm still missing is a way to measure how efficient the code was in general. I.e. number of cache misses and/or the actual amount of instructions called during the time between the two rdtsc calls.
If I could just read the amount of instructions executed so far, that would let me compare it to the rdtsc results, to learn how much approximately each instruction spent time executing. That is a fairly good measurement to how much time was spent waiting for memory accesses to finish. I.e. the lower the ratio between CPU cycles spent vs. CPU instructions executed, the more efficiently the code is running. And most of that inefficiency is likely created by cache misses.
So is there a way to read the exact amount of CPU instructions executed so far? What about exact amount of cache misses and maybe even memory accesses?
I'm mostly developing on Intel Mac (Xcode) and now and then test my code on a PC (Windows, Visual Studio).