Is there a way to identify CMS concurrent mode failures over JMX?

560 Views Asked by At

So I've been trying to track down a good way to monitor when the JVM might potentially be heading towards an OOM situation. They best way that seems to work with our app is to track back-to-back concurrent mode failures through CMS. This indicates that the tenured pool is filling up faster than it can actually clean itself up, or its reclaiming very little.

The JMX bean for tracking GCs has very generic information such as memory usage before/after and the like. This information has been relatively inconsistent at best. Is there a better way I can be monitoring this potential warning sign of a dying JVM?

2

There are 2 best solutions below

1
On

With standard JRE 1.6 GC, heap utilization can trend upwards overtime with the garbage collector running less and less frequently depending on the nature of your application and your maximum specified heap size. That said, it is hard to say what is going on without having more information.

A few methods to investigate further:

You could take a heap dump of your application while it is running using jmap, and then inspect the heap using jhat to see which objects are in heap at any given time.

You could also run your application with -XX:+HeapDumpOnOutOfMemoryError which will automatically make a heap dump on the first out of memory exception that the JVM encounters.

You could create a monitoring bean specific to your application, and create accessor methods you can hit with a remote JMX client. For example methods to return the sizes of queues and other collections that are likely places of memory utilization in your program.

HTH

1
On

Assuming you're using the Sun JVM then I am aware of 2 options;

  1. memory management mxbeans (API ref starts here) which you appear to be using already though note there are some hotspot specific internal ones you can get access to, see this blog for an example of how to use
  2. jstat: cmd reference is here, you'll want the -gccause option. You can either write a script to launch this and parse the output or, theoretically, you could spawn a process from the host jvm (or another one) that then reads the output stream from jstat to detect the gc causes. I don't think the cause reporting is 100% comprehensive though. I don't know a way to get this info programatically from java code.