StackOverflowException in .NET >= 4.0 - give other threads chance to gracefully exit

284 Views Asked by At

Is there a way how to at least postpone termination of managed app (by few dozens of milliseconds) and set some shared flag to give other threads chance to gracefully terminate (the SO thread itself wouldn't obviously execute anything further)? I'm contemplating to use JIT debugger or CLR hosting for this - I'm curios if anybody tried this before.

Why would I want to do something so wrong?:

Without too much detail - imagine this analogy - you are in a casino betting on a roulette and suddenly find out that the roulette is unreliable fake. So you want to immediately leave the casino, BUT likely want to collect your bets from the table first. Unfortunately I cannot leverage separate process for this as there are very tight performance requirements.

Tried and didn't work:

.NET behavior for StackOverflowException (and contradicting info on MSDN) has been discussed several times on SO - to quickly sum up:

HandleProcessCorruptedStateExceptionsAttribute (e.g. on appdomain unhandled exception handler) doesn't work

ExecuteCodeWithGuaranteedCleanup doesn't work

legacyUnhandledExceptionPolicy doesn't work

There may be few other attempts how to handle StackOverflowExceptions - but it seems to be apparent that CLR terminates the whole process as is mentioned in this great answer by Hans Passant.

Considering to try:

  • JIT debugger - leave the thread with exception frozen, set some shared flag (likely in pinned location) and thaw other threads for a short time.
  • CLR hosting and setting unhandled exception policy

Do you have any other idea? Or any experience (successful/unsuccessful) with those two ways?

2

There are 2 best solutions below

4
On

A StackOverflowException is an immediate and critical exception from which the runtime cannot recover - that's why you can't catch it, or recover from it, or anything else. In order to run another method (whether that's a cleanup method or anything else), you have to be able to create a stack frame for that method, and the stack is already full (that's what a StackOverflowException means!). You can't run another method because running a method is what causes the exception in the first place!

Fortunately, though, this kind of exception is always caused by program structure. You should be able to diagnose and fix the error in your code: when you get the exception, you will see in your call stack that there's a loop of one or more methods recursing indefinitely. You need to identify what the faulty logic is and fix it, and that'll be a lot easier than trying to fix the unfixable exception.

1
On

The word "fake" isn't quite the correct one for your casino analogy. There was a magnitude 9 earth quake and the casino building along with the roulette table, the remaining chips and the player disappeared in a giant cloud of smoke and dust.

The only shot you have at running code after an SOE is to stay far away from that casino, it has to run in another process. A "guard" process that starts your misbehaving program, it can use the Process.ExitCode to detect the crash. It will be -1073741571 (0xc00000fd). The process state is gone, you'll have to use one of the .NET out-of-process interop methods (like WCF, named pipes, sockets, memory-mapped file) to make the guard process aware of things that need to be done to clean up. This needs to be transactional, you cannot reason about the exact point in time that the crash occurred since it might have died while updating the guard.

Do beware that this is rarely worth the effort. Because an SOE is pretty indistinguishable from an everyday process abort. Like getting killed by Task Manager. Or the machine losing power. Or being subjected to the effects of an earth quake :)