I have a system in which lot of threads produces logs which are to be inserted in to a NoSql backend. In order to reduce the network traffic, I introduced a buffer in between the server and backend.
environment is:
Java, JSP, Spring MVC, JDK 1.7 Apache-tomcat-6
The buffer used is ConcurrentLinkedQueue in java. Also implemented a DBPushThread to get logs from queue in every 5 seconds and insert them to backened. We used offer() for insertion and poll() for popping. As per javadoc of poll() - https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html#poll%28%29, it will retrieves the element and update the head of the queue. So this node never gets referenced and eventually garbage collected.
I ran the server for 1 day and observed that the server is too sluggish over time. Took a heap dump(hprof) of the server using JVisualVM and while analyzing observed that there are more than 15,000,00 instances of ConcurrentLinkedQueue$Node objects. On checking the instance view, I can see the LinkedList node value(property "item") and its reference to next node (property "next") is set to null for most of the objects. Means these Node objects are candidate for Garbage collection, but it is not happening and dereferenced Node objects piled up in the memory.
Addition code snippet
public void add(Log log) {
buffer.offer(log);
}
Retrieving contents from the queue(here max index is always specified as the queue size)
public List<Log> getContents(int maxIndex) {
List<Log> logs = new LinkedList<Log>();
for (int i = 0; i < maxIndex; i++) {
Log log = buffer.poll();
logs.add(Log);
}
return logs;
}
I made only buffer(which is the singleton queue) as an instance variable. All others are local scope to the function.
Is it a bug with JDK 1.7 that the abandoned nodes are never get garbage collected?
OR
Do I need to implement an Object pooling in ConcurrentLinkedQueue? If so, how can I achieve it?
OR
Is it a bug with my code ?
Please guide.
As the8472 pointed, analyzed the dump and observed that it is not an issue with ConcurrentLinkedQueue's poll() and offer() methods.
In our architecture, concurrentLinkedQueue acts as a buffer in which logs are piled up and a DBPushThread will fetch the logs from CL Queue and insert them to backend storage. The backend used is elastic search.
Due to intermittent stability/scaling issues with elastic search, the DBPushThread insertion of logs to elasticsearch fails and thrown an exception. We were throwing that exception. Since it is thread, it will be an UnCaughtException and the parent thread never gets notified.
So lot of logs get injected to CL Queue but nothing polled from CL Queue( as DBPushThread died). By handling the elastic search issues and catching the exceptions while inserting data to elastic search, we were able to fix this issue.
We monitored system for approx one month and the memory footprint is consistent. Thanks the8472 for guiding me in right direction