I have a multi-threaded process. Each thread is CPU bound (performs calculations) and also uses a lot of memory. The process starts with 100% cpu utilization according to resource monitor, but after several hours, cpu utilization starts to degrade, slowly. After 24 hours, it's on 90-95% and falling.
The question is - what should I look for, and what best-known-methods can I use to debug this?
Additional info:
I have enough RAM - most of it is unused at any given moment. According to perfmon - memory doesn't grow (so I don't think it's leaking). The code is a mix of .Net and native c++, with some data marshaling back and forth. I saw this on several different machines (servers with 24 logical cores). One thing I saw in perfmon - Modified Page List Bytes indicator increases over time as CPU utilization degrades.
Edit 1 One of the third party libraries that is used is openfst. Looks like it's very related to some mis-usage of that library. Specifically, I noticed that I have the following warnings: warning LNK4087: CONSTANT keyword is obsolete; use DATA
Edit 2
Since the question is closed, and wasn't reopened, I will write my findings and how the issue was solved in the body of the question (sorry) for future users. Turns out there is an openfst.def file that defines all the openfst FLAGS_* symbols to be used by consuming applications/dlls. I had to fix those to use the keyword "DATA" instead of "CONSTANT" (CONSTANT is obsolete because it's risky - more info: https://msdn.microsoft.com/en-us/library/aa271769(v=vs.60).aspx). After that - no more degradation in CPU utilization was observed. No more rise in "modified page list bytes" indicator. I suspect that it was related to the default values of the FLAGS (specifically the garbage collection flags - FLAGS_fst_default_cache_gc) which were non deterministic because of the misusage of CONSTANT keyword in openfst.def file.
Conclusion Understand your warnings! Eliminate as much of them as you can! Thanks.
For a non-obvious issue like this, you should also use a profiler that actually samples the underlying hardware counters in the CPU. Most profilers that I’m familiar with use kernel supplied statistics and not the underlying HW counters. This is especially true in Windows. (The reason is in part legacy, and in part that Windows wants its kernel statistics to be independent of hardware. PAPI APIs attempt to address this but are still relatively new.)
One of the best profilers is Intel’s VTune. Yes, I work for Intel but the internal HPC people use VTune as well. Unfortunately, it costs. If you’re a student, there are discounts. If not, there is a trial period.
You can find a lot of optimization and performance issue diagnosis information at software.intel.com. Here are pointers for optimization and for profiling. Even if you are not using an x86 architecture, the techniques are still valid.
As to what might be the issue, a degradation that slow is strange.
If you are using an x86 architecture, consider submitting a question in an Intel forum (e.g. "Intel® Clusters and HPC Technology" and "Software Tuning, Performance Optimization & Platform Monitoring").
Let us know what you ultimately find out.