I have a performance issue with bulk insert into neo4j.
I have a csv file with 400k rows which produces about 3.5 million rows, and I use LOAD CSV command, with the latest version on neo4j.
I've noticed that when I user Create statement, the load takes about 4 minutes, and without indexes at all- about 3.5 minutes.
My first question, is whether this is the normal rate of nodes/ min.
Now, my real problem, is that I need to use merge, for data integrity reasons, and when I use it, it can take even 24 hours, together with indexes.
So 2 additional questions will be:
Is the LOAD CSV recommended for the best performance load,
and also: What can I do do about this performance issue?
EDIT:
here is the query:
LOAD CSV WITH HEADERS FROM 'file:///import.csv' AS line FIELDTERMINATOR '|'
MERGE (session :Session { session:line.session })
MERGE (hit :Hit { key:line.key,date_time:line.date_time,session:line.session })
MERGE (user :User { id:line.user_id })
MERGE (session2 :Session2 { session2:line.session2 })
MERGE (country :Country{ name:line.country})
MERGE (tv :TV { name:tv.Model })
MERGE (transfer_protocol :Protocol { name:line.transfer_protocol })
MERGE (os :OS { name:line.os_name ,version:line.os_version, row_key:line.os_name+line.os_version})
Sample: session_guid|hit_key_guid|useridguid|session2_guid|PANASONIC|TCP|ANDROID|5.0
the session,user,session2,country,tv,transfer_protocol and os has unique constraint, and hit has an index
**session1 and session2 can have many hits (1 to 100, average 5) hit_key_guid is different for each csv line
it's running really slow- pretty strong machine, and each 1000 rows can take up to 10 seconds.
also checked with the profiler, and no "Eager"
thanks
Lior
It appears that the server side code, didn't create the indexes properly, and once they were created, the load done in good performance