Memory leak in ConcurrentLinkedQueue$Node objects

3k Views Asked by At

I have a system in which lot of threads produces logs which are to be inserted in to a NoSql backend. In order to reduce the network traffic, I introduced a buffer in between the server and backend.

environment is:

Java, JSP, Spring MVC, JDK 1.7 Apache-tomcat-6

The buffer used is ConcurrentLinkedQueue in java. Also implemented a DBPushThread to get logs from queue in every 5 seconds and insert them to backened. We used offer() for insertion and poll() for popping. As per javadoc of poll() - https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#poll%28%29, it will retrieves the element and update the head of the queue. So this node never gets referenced and eventually garbage collected.

I ran the server for 1 day and observed that the server is too sluggish over time. Took a heap dump(hprof) of the server using JVisualVM and while analyzing observed that there are more than 15,000,00 instances of ConcurrentLinkedQueue$Node objects. On checking the instance view, I can see the LinkedList node value(property "item") and its reference to next node (property "next") is set to null for most of the objects. Means these Node objects are candidate for Garbage collection, but it is not happening and dereferenced Node objects piled up in the memory. enter image description here

Addition code snippet

public void add(Log log) {
        buffer.offer(log);
    }

Retrieving contents from the queue(here max index is always specified as the queue size)

public List<Log> getContents(int maxIndex) {
    List<Log> logs = new LinkedList<Log>();

    for (int i = 0; i < maxIndex; i++) {
        Log log = buffer.poll();
        logs.add(Log);
    }
    return logs;
}

I made only buffer(which is the singleton queue) as an instance variable. All others are local scope to the function.

Is it a bug with JDK 1.7 that the abandoned nodes are never get garbage collected?

OR

Do I need to implement an Object pooling in ConcurrentLinkedQueue? If so, how can I achieve it?

OR

Is it a bug with my code ?

Please guide.

2

There are 2 best solutions below

0
On BEST ANSWER

As the8472 pointed, analyzed the dump and observed that it is not an issue with ConcurrentLinkedQueue's poll() and offer() methods.

In our architecture, concurrentLinkedQueue acts as a buffer in which logs are piled up and a DBPushThread will fetch the logs from CL Queue and insert them to backend storage. The backend used is elastic search.

Due to intermittent stability/scaling issues with elastic search, the DBPushThread insertion of logs to elasticsearch fails and thrown an exception. We were throwing that exception. Since it is thread, it will be an UnCaughtException and the parent thread never gets notified.

So lot of logs get injected to CL Queue but nothing polled from CL Queue( as DBPushThread died). By handling the elastic search issues and catching the exceptions while inserting data to elastic search, we were able to fix this issue.

We monitored system for approx one month and the memory footprint is consistent. Thanks the8472 for guiding me in right direction

1
On

On checking the instance view, I can see the LinkedList node value(property "item") and its reference to next node (property "next") is set to null for most of the objects.

No, those are outgoing references. Instead you should checking incoming references to those objects. Something is holding onto them.

From your screenshot it actually looks like both the CLQ's head and tail point to instance #5, which makes me wonder what all the other Node instances are referenced by.

Generally you have to analyze paths to GC roots to find what's holding onto the objects.

CLQ complicates that issue since it lazily updates/clears some pointers which may fail under concurrent access but should get cleaned up later, i.e. they shouldn't keep piling up.

And you also should check if your heap dump profiler shows "floating garbage", i.e. objects that are eligible for collection but simply have not been collected yet. You might be barking up the wrong tree if that's the case.