Neo4j Embedded 2.2.1: Exception in thread "GC-Monitor" java.lang.OutOfMemoryError: Java heap space

223 Views Asked by At

I am trying to do my batch insertion to an existing database but I got the following exception:

Exception in thread "GC-Monitor" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2245) at java.util.Arrays.copyOf(Arrays.java:2219) at java.util.ArrayList.grow(ArrayList.java:242) at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) at java.util.ArrayList.add(ArrayList.java:440) at java.util.Formatter.parse(Formatter.java:2525) at java.util.Formatter.format(Formatter.java:2469) at java.util.Formatter.format(Formatter.java:2423) at java.lang.String.format(String.java:2792) at org.neo4j.kernel.impl.cache.MeasureDoNothing.run(MeasureDoNothing.java:64) Fail: Transaction was marked as successful, but unable to commit transaction so rolled back.

Here is the structure of my insertion code :

public void parseExecutionRecordFile(Node episodeVersionNode, String filePath, Integer insertionBatchSize) throws Exception {
        Gson gson = new Gson();
        BufferedReader reader = new BufferedReader(new FileReader(filePath));
        String aDataRow = "";
        List<ExecutionRecord> executionRecords = new LinkedList<>();

        Integer numberOfProcessedExecutionRecords = 0;
        Integer insertionCounter = 0;
        ExecutionRecord lastProcessedExecutionRecord = null;
        Node lastProcessedExecutionRecordNode = null;

        Long start = System.nanoTime();
        while((aDataRow = reader.readLine()) != null) {
            JsonReader jsonReader = new JsonReader(new StringReader(aDataRow));
            jsonReader.setLenient(true);
            ExecutionRecord executionRecord = gson.fromJson(jsonReader, ExecutionRecord.class);
            executionRecords.add(executionRecord);

            insertionCounter++;

            if(insertionCounter == insertionBatchSize || executionRecord.getType() == ExecutionRecord.Type.END_MESSAGE) {
                lastProcessedExecutionRecordNode = appendEpisodeData(episodeVersionNode, lastProcessedExecutionRecordNode, executionRecords, lastProcessedExecutionRecord == null ? null : lastProcessedExecutionRecord.getTraceSequenceNumber());
                executionRecords = new LinkedList<>();
                lastProcessedExecutionRecord = executionRecord;
                numberOfProcessedExecutionRecords += insertionCounter;
                insertionCounter = 0;
            }
        }
    }

public Node appendEpisodeData(Node episodeVersionNode, Node previousExecutionRecordNode, List<ExecutionRecord> executionRecordList, Integer traceCounter) {
        Iterator<ExecutionRecord> executionRecordIterator = executionRecordList.iterator();

        Node previousTraceNode = null;
        Node currentTraceNode = null;
        Node currentExecutionRecordNode = null;

        try (Transaction tx = dbInstance.beginTx()) {
            // some graph insertion

            tx.success();
            return currentExecutionRecordNode;
        }
    }

So basically, I read json object from a file (ca. 20,000 objects) and insert it to neo4j every 10,000 records. If I have only 10,000 JSON objects in the file, then it works fine. But when I have 20,000, it throws the exception.

Thanks in advance and any help would be really appreciated!

2

There are 2 best solutions below

0
On

If with 10000 objects works, just try to at least duplicate the heap memory. Take a look at the following site: http://neo4j.com/docs/stable/server-performance.html

The wrapper.java.maxmemory option could resolve your problem.

5
On

As you also insert several k properties all that tx state will be held in memory. So I think 10k batch size is just fine for that amount of heap.

You also don't close your JSON reader so it might linger around with the StringReader inside.

You should also use an ArrayList initialized at your batch-size and use list.clear() instead of recreation/reassignment.