Why are number of instructions non-deterministic in Linux performance counters

446 Views Asked by At

To be able to profile application runtimes whose binaries will actually be run under a simulator (NS-3/DCE). I wanted to use the linux performance counters, I expected the instruction count for an application which has no source of non-determinism to be deterministic. I couldn't be more wrong according to the linux performance counters, let's take a simple example:

$ (perf stat -c -- sleep 1 2>&1 && perf stat -c -- sleep 1 2>&1) |grep instructions
        669218 instructions              #    0,61  insns per cycle
        682286 instructions              #    0,58  insns per cycle

1) What is the source of this non-determinism? Does this stem from the low-level branch-prediction and other engines in the CPU.

2) Other question, is there a way to know the amount of instructions fed to the CPU (in contrast to the amount of instructions in the example output), in order to do get the amount of executed code in a deterministic way?

1

There are 1 best solutions below

2
On BEST ANSWER

Summary:

1) The non-determinism is caused by variation in the sleep 1 command not from branch-prediction or other microarchitectural features.

2) You can find the number of instruction fetched by using a hardware even counter if your CPU supports it. However, this will vary more than the number of instructions retired (which is what perf typically reports for instructions).

Details:

The sleep command is not a good test case if you want a deterministic number of instructions to execute. It will execute a non-deterministic number of instructions because there will be some slight variation in what the kernel is doing.

You can specify whether to collect user-mode or kernel-mode instruction counts with the instructions:u for user-mode or instructions:k for kernel mode. For two runs of:

perf stat -e instructions:k,instructions:u,instructions sleep 1

I get the following results:

Performance counter stats for 'sleep 1':

       373,044 instructions:k            #    0.00  insns per cycle        
       199,795 instructions:u            #    0.00  insns per cycle        
       572,839 instructions              #    0.00  insns per cycle        

   1.001018153 seconds time elapsed

and

Performance counter stats for 'sleep 1':

       379,722 instructions:k            #    0.00  insns per cycle        
       199,970 instructions:u            #    0.00  insns per cycle        
       579,519 instructions              #    0.00  insns per cycle        

   1.000986201 seconds time elapsed

As you can see the actual elapsed time of sleep 1 varies slightly. Which is the source of the non-determinism. However, the number of user-mode instructions has less variation than kernel-mode instructions.