When examining the output of my benchmarks with the Google Benchmark framework, I observed that the standard deviation of the measured cpu time was in many cases significantly larger than the standard deviation of the measured real time.
Why is that? Or is that result due to measurement errors? I am quite surprised about that because I expected the cpu time to be more reproducible.
This is a general observation on my system. I nevertheless provide a simple example:
#include <benchmark/benchmark.h>
#include <cmath>
static void BM_SineEvaluation(benchmark::State& state)
{
for (auto _ : state)
{
double y = 1.0;
for (size_t i = 0; i < 100; ++i)
{
y *= std::sin(y) * std::sin(y) + std::cos(y) * std::cos(y);
y+= std::sin(std::cos(y));
}
benchmark::DoNotOptimize(y);
}
}
BENCHMARK(BM_SineEvaluation);
The example does not even contain heap allocations. None of the sin/cos functions are optimized away by the compiler. That's all of the code. The time measurements are completely done inside of the Google Benchmark library, which is openly available on github. But I haven't looked into the implementation so far.
When running the program with command-line arguments --benchmark_repetitions=50 --benchmark_report_aggregates_only=true, I get an output like this:
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_SineEvaluation_mean 11268 ns 11270 ns 64000
BM_SineEvaluation_median 11265 ns 11230 ns 64000
BM_SineEvaluation_stddev 11 ns 90 ns 64000
I am using Google Benchmark v1.4.1 on a really old Intel Core i7 920 (Bloomfield) with Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24218.1 for x86 (Visual Studio 2015) with /O2.
Edit: I did further measurements on Fedora-28 on Intel i5-4300U CPU with gcc-8.1.1 (which is smart enough to call sincos with -O2) and found a contrasting behavior:
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_SineEvaluation_mean 54642 ns 54556 ns 12350
BM_SineEvaluation_median 54305 ns 54229 ns 12350
BM_SineEvaluation_stddev 946 ns 888 ns 12350
When omitting -O2 (which is more close to MSVC because it has separate sin/cos calls), I still get the same qualitative result: the standard deviation of the real time is also larger than the standard deviation of the cpu time.
I am not quite sure what conclusion to draw from that. Does this mean that the time measurements on Windows are just less precise?