We are dealing with a nasty lock contention issue in a high traffic ASP.NET Core MVC app in .NET 6, hosted as in-process in IIS Windows Server.
We are trying to enable a feature that makes an Http call to an internal service, and when we enable the feature, we get increased lock contention (from dotnet-counters), spikes of CPU (from 20% to 100%), spikes of RAM (from 4GB to 12GB), increased number of threads in thread pool (from ~60 to ~310) and obviously the app struggles to serve the incoming requests. The issue appears ONLY in the production environment and unfortunately we haven't been able to reproduce it in any other environment (local, UAT, staging).
We've taken countless dumps and traces to analyze and identify the issue, however all of them point to a general "increased lock contention" issue and the hot path is always this stack trace:
ntdll.dll!NtRemoveIoCompletion()
KERNELBASE.dll!GetQueuedCompletionStatus()
System.Private.CoreLib.dll!00007ff803838421()
[Managed to Native Transition]
System.Private.CoreLib.dll!System.Threading.LowLevelLifoSemaphore.WaitForSignal(int timeoutMs = 0x00004e20)
System.Private.CoreLib.dll!System.Threading.LowLevelLifoSemaphore.Wait(int timeoutMs, bool spinWait)
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
System.Private.CoreLib.dll!System.Threading.Thread.StartCallback()
[Native to Managed Transition]
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
Actually we see an enormous increase of threads in the dumps taken before and after, all pointing to IOCP threads that wait for the GetQueuedCompletionStatus() method.
Unfortunately we weren't able to identify the worker threads that spawn the IOCP threads, and we are not aware of any way to associate the IOCP thread to the IOCP port and the relevant worker thread.
We are trying blindly to make changes related to our async implementations, but still no luck.
We have also tried all HttpClient patterns just to make sure we are not missing something obvious: Long lived static HttpClient Short lived named HttpClient instances with IHttpClientFactory Custom pool of long lived HttpClient instances
We 've set the minimum worker and iocp threads to 200 and then to 300 from default, still no luck OR any change in behavior of the app.
Running syncblk does not reveal anything (no synchronous locks are detected), neither any other common methods of identifying locks/contention from dumps or traces.
I would really appreciate any feedback on this, since we've been banging our heads on the wall for more than a month now, thank you! :)
See comment by Palec ConfigureAwait(false) relevant in ASP.NET Core?
I think you should give a try to the ConfigureAwait(false)... this was a solution many times before.