I want to load a set of large rdf triple files into Neo4j. I have already written a map-reduce code to read all input n-triples and output two CSV files: nodes.csv (7GB - 90 million rows) and relationships.csv (15GB - 120 million rows).
I tried batch-import command from Neo4j v2.2.0-M01, but it crashes after loading around 30M rows of nodes. I have 16GB of RAM in my machine so I set wrapper.java.initmemory=4096 and wrapper.java.maxmemory=13000. So, I decided to split nodes.csv and relationships.csv into smaller parts and run batch-import for each part. However, I don't know how to merge the databases created from multiple imports. I appreciate any suggestion on how to load large CSV files into Neo4j.
Why don't you try this approach (using groovy): http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/
you will create uniqueness constraint on nodes, so duplicates won't be created.