Throwing a C++ exception with LibUnwind on PowerPC loaded sets random floating point exception traps

158 Views Asked by At

I'm currently debugging some failure in PyTorch which is a Python library with a C++ extension, so there is some C++ code called by the Python code.

The failure happens because some floating point exception traps are getting set before a seemingly innocent std::exp call causing a core dump. Strangely reducing this to a minimum, just setting the FPE via feenableexcept and then calling std::exp with the same values doesn't produce this crash/core dump. So I'm stuck with debugging the original application.

Doing some printf-debugging (of course the code doesn't break [i.e. traps are not set] when compiled in debug mode) I narrowed it down to a throw c10::Error(...) statement. This class is derived from std::exception so nothing unusual here. To translate that C++ exception into a Python exception a catch(...){ /*set a bool*/; throw;}catch(c10::Error&){...} is entered. Nothing looks odd so far and of course this also does not reproduce in a minimal setup doing the same.

Using gdb with catch throw and catch catch I got to the place where this exception is thrown and caught and did some single-stepping (step) followed by p fegetexcept() and indeed:

90  in ../../../../libstdc++-v3/libsupc++/eh_throw.cc
(gdb) p fegetexcept()
$20 = 0
(gdb) s
Catchpoint 4 (exception caught), __cxxabiv1::__cxa_begin_catch (exc_obj_in=0x11c6da60) at ../../../../libstdc++-v3/libsupc++/eh_catch.cc:42
42  ../../../../libstdc++-v3/libsupc++/eh_catch.cc: Datei oder Verzeichnis nicht gefunden.
(gdb) p fegetexcept()
$21 = 536870912

So right inside the throw the FPE is still not set and right inside the catch it is. The line in eh_throw is _Unwind_RaiseException (&header->exc.unwindHeader); which I can't step into.

Also the value of fegetexcept() is pretty much different per program invocation. Furthermore the problem goes away if I do NOT build with GLOG which I further traced to it using libunwind.

However I can't get any further than to the point where libunwind calls setcontext from which I only get assembly. At a line lfd fp29,(SIGCONTEXT_FP_REGS+(PT_R29*8))(r31) the value of fegetexcept() changes.

So this looks like an issue of libunwind. However the issue does not appear either when I use clang 9.0.1 instead of GCC 8.3.0. So I'm at loss here.

Does anyone have an idea what the issue could be, what else I can do or if there is a known bug? This is using glibc 2.17 and libunwind 1.4.0 in case that matters.

0

There are 0 best solutions below