I'm trying to do something like the following using the py2neo module to get information for a large quantity of nodes in a neo4j database that I already know the id's of:
query = f'''
MATCH
(n:MY_LABEL)
OPTIONAL MATCH
(n) -- (u:OTHER_LABEL) // Won't always have a neighbor
WHERE
id(n) in [{','.join(very_long_list_of_nids)}]
RETURN
id(n) as nid,
n.feature1,
u.feature2
'''
resp = graph.run(query)
And I have noticed it's far faster to just omit the WHERE clause, and do filtering after it returns the content of every n:MY_LABEL node. Is there a more elegant way to do this?
For reference, the very_long_list_of_nodes list is about 500k elements long (and I have tried batching it into smaller, 10k chunks and have the same problem) and the database contains 4m nodes, and 10m edges.
You should:
WHEREclause right under yourMATCHclause. Currently, yourWHEREclause is under theOPTIONAL MATCHclause, and so the ID filtering is only done after finding the relationships of allMY_LABELnodes.:MY_LABELqualification from theMATCHclause. If you already get the node by native ID, checking the label is unnecessary; and you are not using indexing.This should be much faster:
Also, if the relationships between
MY_LABELandOTHER_LABELalways flow in one direction, you should consider using a directional relationship pattern (either-->or<--) in yourOPTIONAL MATCHclause, especially if yourMY_LABELnodes have other kinds of relationships that flow in the opposite direction.