How I profile multithreading problems?

578 Views Asked by At

This is the first time I am trying to profile a multi-threaded program.

I suspect the problem is it waiting for something, but I have no clue what, the program never reaches 100% of CPU, GPU, RAM or I/O use.

Until recently, I've only worked on projects with single-threading, or where the threads were very simple (example: usually an extra thread just to ensure the UI is not locked while the program works, or once I made a game engine with a separate thread to handle .XM and .IT files music, so that the main thread could do everything, while the other thread in another core could take care of decoding those files).

This program has several threads, and they don't do parallel work on the same tasks, each thread has its own completely separate purpose (for example one thread is dedicated to handling all sound-related API calls to the OS).

I downloaded Microsoft performance tools, there is a blog by an ex-Valve employee that explains that they work to do this, but although I even managed to make some profiles and whatnot, I don't really understood what I am seeing, it is only a bunch of pretty graphs to me (except the CPU use graph, that I already knew from doing sample-based profiling on single-threaded apps), so, how I find why the program is waiting on something? Or how I find what is it waiting for? How I find what thread is blocking the others?

2

There are 2 best solutions below

1
On
  1. Performance Wizard in Visual Studio Performance and Diagnostics Hub has "Resource contention data" profiling regime which allows to analyze concurrency contention among threads, i.e. how the overall performance of a program is impacted by threads waiting on other threads. Please refer to this blog post for more details.
  2. PerfView is an extremely powerful profiling tool which allows one to analyze the impact of service threads and tasks to the overall performance of the program. Here is the PerfView Tutorial available.
0
On

I look at is as an alternation between two things:

a) measuring overall time, for which all you need is some kind of timer, and

b) finding speedups, which does not mean measuring, in spite of what a lot of people have been told.

Each time you find a speedup, you time the results and do it again. That's the alternation. To find speedups, the method I and many people use is random pausing. The idea is, you get the program running under a debugger and manually interrupt it, several times. Each time, you examine the state of every thread, including the call stack. It is very crude, and it is very effective.

The reason this works is that the only way the program can go faster is if it is doing an activity that you can remove, and if that saves a certain fraction of time, you are at least that likely to see it on every pause. This works whether it is doing I/O, waiting for something, or computing. It sees things that profilers do not expose, because they make summaries from which speedups can easily hide.