How can I get the JVM to exit quickly after a SIGSEGV crash?

70 Views Asked by David Tinker At 19 January 2024 at 06:05

We have a service that crashes frequently due to some issue with TensorFlow Java. That we can live with (K8s restarts it, lots of instances). The problem is that it takes several minutes for the JVM to die. Is there some way to force a quick exit on SIGSEGV in native code?

corrupted size vs. prev_size while consolidating
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe4f321a898, pid=1, tid=545
#
# JRE version: OpenJDK Runtime Environment Zulu21.28+85-CA (21.0+35) (build 21+35)
# Java VM: OpenJDK 64-Bit Server VM Zulu21.28+85-CA (21+35, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x28898]  abort+0x178
#
# Core dump will be written. Default location: /data/core
#
# An error report file with more information is saved as:
# /data/hs_err_pid1.log

Some minutes later:

# [ timer expired, abort... ]
[thread 1037 also had an error]

Original Q&A

There are 2 best solutions below

apangin On 19 January 2024 at 12:13 BEST ANSWER

Add the following JVM options:

-XX:+SuppressFatalErrorMessage -XX:-CreateCoredumpOnCrash

This will force JVM terminate immediately on SIGSEGV without creating an error report or coredump. If you still want to see a fatal error message, replace -XX:+SuppressFatalErrorMessage with -XX:ErrorLogTimeout=1.

raner On 19 January 2024 at 07:17

I would suspect that this JVM is running with a pretty large heap (> 64 GB), and that it just takes a while to write out the core dump file for a process that uses so much memory:

# Core dump will be written. Default location: /data/core

During the several minutes that this takes you might see the core dump file growing in the above location (that would be an easy way to confirm this theory).

The remedy would be to disable the creation of core dump files, the details of which would depend on your specific operating system (but core dumps can be disabled on pretty much any UNIX-based operating system). Additionally, there might be some filesystem-related bottleneck with that specific location that causes core dumps to be written slower than one would expect.

How can I get the JVM to exit quickly after a SIGSEGV crash?

There are 2 best solutions below

Related Questions in JAVA

Related Questions in JVM

Related Questions in JVM-ARGUMENTS

Related Questions in JVM-CRASH

Trending Questions

Popular # Hahtags

Popular Questions