Load very large CSV into Neo4j

1.2k Views Asked by At

I want to load a set of large rdf triple files into Neo4j. I have already written a map-reduce code to read all input n-triples and output two CSV files: nodes.csv (7GB - 90 million rows) and relationships.csv (15GB - 120 million rows).

I tried batch-import command from Neo4j v2.2.0-M01, but it crashes after loading around 30M rows of nodes. I have 16GB of RAM in my machine so I set wrapper.java.initmemory=4096 and wrapper.java.maxmemory=13000. So, I decided to split nodes.csv and relationships.csv into smaller parts and run batch-import for each part. However, I don't know how to merge the databases created from multiple imports. I appreciate any suggestion on how to load large CSV files into Neo4j.

2

There are 2 best solutions below

1
On

Why don't you try this approach (using groovy): http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

you will create uniqueness constraint on nodes, so duplicates won't be created.

0
On

I could finally load the data using batch-import command in Neo4j 2.2.0-M02. It took totally 56 minutes. The issue preventing Neo4j from loading the CSV files was having \" in some values, which was interpreted as a quotation character to be included in the field value and this was messing up everything from this point forward.