I wrote a quick ruby routine to load some very large csv data. I got frustrated with various out of memory issues trying to use load_csv so reverted to ruby. I'm relatively new to neo4j so trying Neography to just call a cypher query I create as a string.
The cypher code is using merge to add a relationship between 2 existing nodes:
cmdstr=match (a:Provider {npi: xxx}),(b:Provider {npi:yyy}) merge (a)-[:REFERS_TO {qty: 1}]->(b);
@neo.execute_query(cmdstr)
I'm just looping through the rows in a file running these. It fails after about 30000 rows with socket error "cannot assign requested address". I believe GC is somehow causing issues. However the logs don't tell me anything. I've tried tuning GC differently, and trying different amounts of heap. Fails in the same place everytime. Any help appreciated.
[edit] More info - Running netstat --inet shows thousands of connections to localhost:7474. Does execute_query not reuse connections by design or is this an issue?
I've now tried parameters and the behavior is the same. How would you code this kind of query using batches and make sure you use the index on npi?
With Neo4j 2.1.3 the LOAD CSV issue is resolved:
In your ruby code you should use Cypher parameters and probably the transactional API. Do you limit the concurrency of your requests somehow (e.g. single client)?
Also make sure to have an index or constraint created for your providers:
or