I have two different Windows Services written in fully managed C# (i.e., no 3rd party libraries or PInvoke) that randomly crash with the same call stack on an unmanaged thread. The crash can take from several weeks to occur to a few hours. This only happens in production so I am unable to debug the crash other than using a crash dumps and WinDbg. This is the exception information from running !analyze -v:
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ffd56b56700 (ntdll!memcpy+0x0000000000000040)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000001f76dea0f78
Attempt to read from address 000001f76dea0f78
PROCESS_NAME: *Removed*
READ_ADDRESS: 000001f76dea0f78
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.
EXCEPTION_CODE_STR: c0000005
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 000001f76dea0f78
STACK_TEXT:
ntdll!memcpy+0x40
KERNELBASE!StmWrite+0x72
KERNELBASE!PerfpAddCounterToStream+0x89
KERNELBASE!PerfpAddInstanceToStream+0xcf
KERNELBASE!PerfpAddQueryItemToStream+0x15d
KERNELBASE!PerfpNotifyAndCollect+0x113
KERNELBASE!PerfpNotificationCallback+0xa6
ntdll!TppIopExecuteCallback+0x129
ntdll!TppWorkerThread+0x3c8
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21
Both services do very different things and are written in fully managed code so I cannot figure out where this unmanaged thread is coming from or what it is trying to do. I also am unable to find any documentation of these methods that are being called in kernelbase so I am running out of ideas on how to track down what is causing the crash.
When I look at the address that is trying to be read from on the heap this is what I see:
!heap -i 000001f76dea0f78
Detailed information for block entry 000001f76dea0f78
Assumed heap : 0x000001f76df60000 (Use !heap -i NewHeapHandle to change)
Header content : 0x00000000 0x00000000
Owning segment : 0x000001f76df60000 (offset 0)
Block flags : 0x0 (free )
Total block size : 0x0 units (0x0 bytes)
Previous block size: 0x74e units (0x74e0 bytes)
Block CRC : OK - 0x0
List corrupted: (Blink->Flink = 0000000000000000) != (Block = 000001f76dea0f88)
Free list entry : CORRUPTED
Previous block : 0x000001f76de99a98
Next block : 0x000001f76dea0f78
Please provide any ideas on what I can do to track down the cause of this issue or what these methods are trying to do.
Update: Both of these services used a DLL that contained a type library for AzMan (Windows Authorization Manager) which uses COM interop. I developed my own C# library to read the AzMan data and the crash finally stopped happening. Microsoft has claimed that AzMan is depreciated for several years but there own software uses it so it has remained as part of Windows. Fair warning to anyone that is still using it.