How to pause the timer when benchmarking an already multithreaded function in google benchmark?

800 Views Asked by At

The documentation on GitHub has a section on multithreaded benchmarks, however, it requires placing the multithreaded code inside the benchmark definition, and the library would invoke this code with multiple threads itself.

I want to benchmark a function that creates threads inside. I am interested to optimize the multithreaded part only, so I want to benchmark that part alone. Thus I want to pause the timer while the function's sequential code is running or the internal threads are being created / destroyed and do setup / teardown.

1

There are 1 best solutions below

0
On

Use the thread barrier synchronization primitive to wait until all threads have been created, or finished setup, etc. This solution uses boost::barrier, but one could also use std::barrier since C++20, or implement a custom barrier. Be careful if implementing yourself as it's easy to screw up, but this answer seems to have it right.

Pass benchmark::State & state to your function and your threads to pause / unpause when needed.

#include <thread>
#include <vector>

#include <benchmark/benchmark.h>
#include <boost/thread/barrier.hpp>

void work() {
    volatile int sum = 0;
    for (int i = 0; i < 100'000'000; i++) {
        sum += i;
    }
}

static void thread_routine(boost::barrier& barrier, benchmark::State& state, int thread_id) {
    // do setup here, if needed
    barrier.wait();  // wait until each thread is created
    if (thread_id == 0) {
        state.ResumeTiming();
    }
    barrier.wait();  // wait until the timer is started before doing the work

    // do some work
    work();

    barrier.wait();  // wait until each thread completes the work
    if (thread_id == 0) {
        state.PauseTiming();
    }
    barrier.wait();  // wait until the timer is stopped before destructing the thread
    // do teardown here, if needed
}

void f(benchmark::State& state) {
    const int num_threads = 1000;
    boost::barrier barrier(num_threads);
    std::vector<std::thread> threads;
    threads.reserve(num_threads);
    for (int i = 0; i < num_threads; i++) {
        threads.emplace_back(thread_routine, std::ref(barrier), std::ref(state), i);
    }
    for (std::thread& thread : threads) {
        thread.join();
    }
}

static void BM_AlreadyMultiThreaded(benchmark::State& state) {
    for (auto _ : state) {
        state.PauseTiming();
        f(state);
        state.ResumeTiming();
    }
}

BENCHMARK(BM_AlreadyMultiThreaded)->Iterations(10)->Unit(benchmark::kMillisecond)->MeasureProcessCPUTime(); // NOLINT(cert-err58-cpp)
BENCHMARK_MAIN();

On my machine, this code outputs (skipping the header):

---------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations
---------------------------------------------------------------------------------------------
BM_AlreadyMultiThreaded/iterations:10/process_time       1604 ms       200309 ms           10

If I comment out all the state.PauseTimer() / state.ResumeTimer(), it outputs:

---------------------------------------------------------------------------------------------
Benchmark                                                   Time             CPU   Iterations
---------------------------------------------------------------------------------------------
BM_AlreadyMultiThreaded/iterations:10/process_time       1680 ms       200102 ms           10

I consider the 80 ms of real time / 200 ms of CPU time difference to be statistically significant, rather than noise, which supports the hypothesis that this example works correct.