I'm using Google benchmark to time a function, but I need to see how it performs when working with a "cold" cache. I know the benchmark library will run a function until the timing is steady, but I would like this steady timing to incorporate the fact that the cache is cold. This is roughly what my benchmarking looks like:
template <class ...Args>
void BM_MyFunc(benchmark::State& state, Args&&... args) {
auto args_tuple = std::make_tuple(std::move(args)...);
const int arg = std::get<0>(args_tuple);
for (auto _ : state) {
// flush cache by accessing a huge array
for (int i = 0; i < N; i++) {
huge_array[i] = rand();
}
my_func(arg);
}
}
BENCHMARK_CAPTURE(BM_MyFunc, my_func_with_42, 42);
The problem is that putting the cache flushing inside for (auto _ : state)
means that the actual act of flushing the cache appears to be part of the timing results. If I put the cache flushing outside that loop, then the cache is only flushed once and the benchmark library treats it as a bad thing and warms up the my_func
so that it's not working with a cold cache.
Is there some way to have a "per function-call" setup that doesn't contribute to the timing of said function? The documentation doesn't seem to cover this particular use case.