Exposing whether an application is undergoing GC via UDP

89 Views Asked by At

The motivation behind this question is to see whether we can make a theoretical load balancer more efficient for edge-cases by first applying its regular strategy of nominating a particular node to route an HTTP request to (say, via a round robin strategy) and then "peeking" into the internal state of the system to see whether it is undergoing garbage collection. If so, the load balancer avoids the node altogether and moves onto the next one.

In the ideal scenario, each node would "emit" its internal state every few seconds via UDP to some message queue letting the load balancer know which nodes are potentially "radio-active" if they're going through GC (I'm visualizing it as a simple boolean).

The question here is: can I tweak my application to tap into its JVM's internal state and (a) figure out whether we're in GC mode right this instant (b) emit this result via some protocol (UDP/HTTP) over the wire to some other entity (like an MQ or something).

3

There are 3 best solutions below

3
On BEST ANSWER

There are a whole bunch of ways to monitor and report on a VM remotely. A well-known protocol, for example, is SNMP. But this is a very complicated subject.

Implementation sort of depends on your requirements. If you need to be really sure a VM is in a good state, you might need to wrap your application in a wrapper VM that controls the actual VM. This is pretty involved.

Many implementations use the built-in monitoring and profiling interfaces that are exposed as beans to participating applications via JMX. Again, this requires a fair amount of tweaking.

I suppose you could create a worker thread that simply acts as a canary. It broadcasts a ping every X seconds, and if the pinged service misses two or three pings, it assumes the VM is not ready to serve.

The problem is deciding what to do when a VM never seems to come back. Is it the VM, the network, or something else? How do you keep track of the VMs? These are not intractable problems, but they combine in interesting ways to make your life equally interesting.

There are a lot of ways to approach this problem, and each has subtle implications.

2
On

Can you do it? Yes.

The GarbageCollectorMXBean can provide notifications of GC events to application code. (For instance, see this article which includes example code for configuring a notification listener and processing the events.)

Given this, you could easily code your application so that key GC events were sent out as UDP messages, and/or regular UDP messages were sent to report the current GC state.

However, if the GC performs a "stop the world" collection, then your code to send out messages will also be stopped, and there is no way around that1. If this is a problem then you probably need to take the "canary" approach ... or switch to a low-pause collector. The "canary" or "heart-beat" approaches also detects other kinds of unavailability, which will be relevant to a load balancer. However, the flip-side is that you can also get false positives; e.g. the "heart" is still "beating" but the "patient" is "comatose".

Whether this is going to actually useful for load balancing purposes is a different question entirely. There is certainly scope for additional failure modes. For instance, if the load balancer misses a UDP message saying that a JVM GC has finished, then the JVM could effectively drop out of the load balancer's pool.


1 - At least, not within Java. You could conceivably build something on the outside of the JVM that (for example) reads the GC log file and checks OS-level process usage information.

0
On

You can write an external application that instruments the JVM, e.g. via dtrace probes and sends the events to the load balancer or is queriable by the load balancer.