tcmalloc huge performance variance

795 Views Asked by At

Our multi-threaded server has hundreds connection threads that are responsible for IO handing and replying to the incoming requests.

There is another asynchronous thread that runs relatively heavy tasks with many allocations from time to time (say every few seconds).

Once I converted that thread to a a small thread pool (i.e. those tasks now run from different threads each time) our server usually has the same CPU usage but it can suddenly reach the state were allocations across all operations take much more time and the overall CPU usage of the server almost doubles from 2 cores to 3.7 cores.

My main theory so far is that I somehow changes access pattern for tcmalloc library and that causes random CPU lifts. What should i look at in tcmalloc stats in order to confirm this theory? Can it be that the same code running now from different threads (but not simultaneously) causes tcmalloc to allocate from the central cache more than from the thread cache?

1

There are 1 best solutions below

2
On

As several commenters have suggested, false sharing might be the problem. Finding false sharing is difficult and not well-supported by current tools. My research group has published these research papers on the topic - at a minimum, they provide an excellent introduction to the problem of false sharing and why it is so insidious.

The tools corresponding to these research papers are available on GitHub: Sheriff, Predator.

While you could try to use one of these tools to find the problem, the easiest thing would be to give Hoard a try. Hoard is a fast, scalable malloc replacement whose design reduces the risk of allocator-induced false sharing. If replacing tcmalloc with Hoard doesn't solve your problem, then it might make sense to pursue other avenues.