The documentation on GitHub has a section on multithreaded benchmarks, however, it requires placing the multithreaded code inside the benchmark definition, and the library would invoke this code with multiple threads itself.
I want to benchmark a function that creates threads inside. I am interested to optimize the multithreaded part only, so I want to benchmark that part alone. Thus I want to pause the timer while the function's sequential code is running or the internal threads are being created / destroyed and do setup / teardown.
Use the thread barrier synchronization primitive to wait until all threads have been created, or finished setup, etc. This solution uses
boost::barrier
, but one could also usestd::barrier
since C++20, or implement a custom barrier. Be careful if implementing yourself as it's easy to screw up, but this answer seems to have it right.Pass
benchmark::State & state
to your function and your threads to pause / unpause when needed.On my machine, this code outputs (skipping the header):
If I comment out all the
state.PauseTimer()
/state.ResumeTimer()
, it outputs:I consider the 80 ms of real time / 200 ms of CPU time difference to be statistically significant, rather than noise, which supports the hypothesis that this example works correct.