So every couple of days my java process on Ubuntu is killed automatically, and I can't figure out why.
My box has 35.84 GB of RAM, when I launch my Java process I pass it the -Xmx28g parameter, so it should be using way less than the maximum RAM available.
I ran jstat as follows:
# jstat -gccause -t `pgrep java` 60000
The last few lines of output from jstat immediately before the process was killed were:
Time S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
14236.1 99.98 0.00 69.80 99.40 49.88 1011 232.305 11 171.041 403.347 unknown GCCause No GC
14296.2 93.02 0.00 65.79 99.43 49.88 1015 233.000 11 171.041 404.041 unknown GCCause No GC
14356.1 79.20 0.00 80.50 99.55 49.88 1019 233.945 11 171.041 404.986 unknown GCCause No GC
14416.2 0.00 99.98 24.32 99.64 49.88 1024 234.945 11 171.041 405.987 unknown GCCause No GC
This seems to be what went down in the /var/log/syslog around this time: https://gist.github.com/1369135
There is really nothing running on this server other than my java app. What's going on?
edit: I'm running java version 1.6.0_20, the only notable parameters I'm passing to java on startup are "-server -Xmx28g". I'm not using an application server but my app embeds the "Simple web framework".
Assuming the problem is the OOM killer, then it has killed your process in a desperate attempt to keep the OS functioning in a severe memory shortage crisis.
I would conclude that:
your JVM is actually using significantly more than 28Gb; i.e. you've got significant non-heap memory usage, and
the OS is not configured with an adequate amount of swap space.
I'd try adding more swap space, so that the OS can swap out parts of your application in an emergency.
Alternatively, reduce the JVM's heap size.
Note that "-Xmx ..." sets the maximum heap size, not the maximum amount of memory that your JVM can use. The JVM puts some stuff outside the heap, including such things as the memory for thread stacks and memory-mapped files that your application is using.
The syslog confirms that it is the OOM killer at work.
It says this:
Correct. It was killed by the operating system's OOM killer.
That is what would have happened if you had filled up the Java heap.
That is not what is going on here. The actual problem is that there is not enough physical RAM to hold the Java heap. The OOM killer deals with it ...
Unfortunately, you are trying to use way more RAM than is available on the system. This is causing virtual memory to thrash, affecting the entire operating system.
When the system starts to thrash badly, the OOM killer (not the JVM) identifies your Java process as the cause of the problem. It then kills it (with a SIGKILL) to protect the rest of the system. If it didn't, there is a risk that the entire system would lock up completely and need to be hard rebooted.
Finally, you said:
That is rather a strange value. 32 GiB is 34,359,738,368 bytes or 34.35 GB.
But based on that and the observed behavior, I suspect that that is the available virtual memory rather than physical RAM. Alternatively, your "box" could be a virtual machine with RAM overcommit enabled at the hypervisor level.