CPU utilization degradation over time

Question

CPU utilization degradation over time

500 Views Asked by lev haikin At 28 June 2025 at 11:51

I have a multi-threaded process. Each thread is CPU bound (performs calculations) and also uses a lot of memory. The process starts with 100% cpu utilization according to resource monitor, but after several hours, cpu utilization starts to degrade, slowly. After 24 hours, it's on 90-95% and falling.

The question is - what should I look for, and what best-known-methods can I use to debug this?

Additional info:

I have enough RAM - most of it is unused at any given moment. According to perfmon - memory doesn't grow (so I don't think it's leaking). The code is a mix of .Net and native c++, with some data marshaling back and forth. I saw this on several different machines (servers with 24 logical cores). One thing I saw in perfmon - Modified Page List Bytes indicator increases over time as CPU utilization degrades.

Edit 1 One of the third party libraries that is used is openfst. Looks like it's very related to some mis-usage of that library. Specifically, I noticed that I have the following warnings: warning LNK4087: CONSTANT keyword is obsolete; use DATA

Edit 2

Since the question is closed, and wasn't reopened, I will write my findings and how the issue was solved in the body of the question (sorry) for future users. Turns out there is an openfst.def file that defines all the openfst FLAGS_* symbols to be used by consuming applications/dlls. I had to fix those to use the keyword "DATA" instead of "CONSTANT" (CONSTANT is obsolete because it's risky - more info: https://msdn.microsoft.com/en-us/library/aa271769(v=vs.60).aspx). After that - no more degradation in CPU utilization was observed. No more rise in "modified page list bytes" indicator. I suspect that it was related to the default values of the FLAGS (specifically the garbage collection flags - FLAGS_fst_default_cache_gc) which were non deterministic because of the misusage of CONSTANT keyword in openfst.def file.

Conclusion Understand your warnings! Eliminate as much of them as you can! Thanks.

Original Q&A

There are 1 best solutions below

**Taylor Kidd** · Answer 1

For a non-obvious issue like this, you should also use a profiler that actually samples the underlying hardware counters in the CPU. Most profilers that I’m familiar with use kernel supplied statistics and not the underlying HW counters. This is especially true in Windows. (The reason is in part legacy, and in part that Windows wants its kernel statistics to be independent of hardware. PAPI APIs attempt to address this but are still relatively new.)

One of the best profilers is Intel’s VTune. Yes, I work for Intel but the internal HPC people use VTune as well. Unfortunately, it costs. If you’re a student, there are discounts. If not, there is a trial period.

You can find a lot of optimization and performance issue diagnosis information at software.intel.com. Here are pointers for optimization and for profiling. Even if you are not using an x86 architecture, the techniques are still valid.

As to what might be the issue, a degradation that slow is strange.

How often do you use new memory or access old? At what rate? If the rate is very slow, you might still be running into a situation where you are slowing using up a resource, e.g. pages.
What are your memory access patterns? Does it change over time? How rapidly? Perhaps your memory access patterns over time are spreading, resulting in more cache misses.
Perhaps your partitioning of the problem space is such that you have entered a new computational domain and there is no real pathology.
Look at whether there are periodic maintenance activities that take place over a longer interval, though this would result in a periodic degradation, say every 24 hours. This doesn’t sound like your situation since you are experiencing is a gradual degradation.

If you are using an x86 architecture, consider submitting a question in an Intel forum (e.g. "Intel® Clusters and HPC Technology" and "Software Tuning, Performance Optimization & Platform Monitoring").

Let us know what you ultimately find out.

CPU utilization degradation over time

There are 1 best solutions below

Related Questions in C++

Related Questions in CPU-USAGE

Related Questions in DLLIMPORT

Related Questions in DLLEXPORT

Related Questions in OPENFST

Trending Questions

Popular # Hahtags

Popular Questions