I am currently having a performance issue on a Neo4J request.
Here is the problem. I need to find users in the database from a large list. To do this, the uniqCode must match, OR the name and location (zip) must match.
Then I want to be able to merge this user with a node I create.
The query below works but it takes between 20 and 30 seconds for a list of 30 users and on real case, it will be necessary to pass a list of 5000 to 10000 users.
I specify that I indexed the uniqCode and the name of the users nodes.
UNWIND $users as row
MATCH (u:User)
WHERE u.uniqCode = row.uniqCode
OR (
apoc.text.clean(u.name) = row.name
AND EXISTS ((u)-[:IS]->(:Zip {name:row.zip}))
)
MERGE (u)<-[:IS]-(a:ParallelUser {id:row.uuid, name: u.name, uniqCode: row.uniqCode})
RETURN {name: a.name, uniqCode: a.uniqCode, id: a.id} AS ParallelUser
with params look like
[{uniqCode: "1234", name: "John Doe", zip: "1234", uuid: "1234"}, ...]
Thank you in advance for your help...
It would be good if you could use an index for the
MATCHclause in your query. You can check to see if the query planner is using any indexes by running the query and prepending withPROFILE. Feel free to post the results back here for more detailed discussion.The query tuning documentation might be helpful to you.
I found this free course enlightening.
You won't be able to use an index on
u.nameif you have to wrap it in theapox.text.clean()function. Can you run that function on the property before you store it, or else create a newcleanNameproperty? Then you could create an index that includes that property.On the
MERGEportion of your query, I wonder if all three properties ofParallelUserare required to uniquely identify the node? If theidalone is sufficient, then you can rewrite theMERGEportion this way: