Best way to query a large list of nodes in neo4j

57 Views Asked by Zaya At 02 August 2023 at 17:24

I'm trying to do something like the following using the py2neo module to get information for a large quantity of nodes in a neo4j database that I already know the id's of:

query = f'''    
    MATCH 
        (n:MY_LABEL)
    OPTIONAL MATCH 
        (n) -- (u:OTHER_LABEL) // Won't always have a neighbor
    WHERE 
        id(n) in [{','.join(very_long_list_of_nids)}]
    RETURN 
        id(n) as nid, 
        n.feature1,
        u.feature2
'''
resp = graph.run(query)

And I have noticed it's far faster to just omit the WHERE clause, and do filtering after it returns the content of every n:MY_LABEL node. Is there a more elegant way to do this?

For reference, the very_long_list_of_nodes list is about 500k elements long (and I have tried batching it into smaller, 10k chunks and have the same problem) and the database contains 4m nodes, and 10m edges.

Original Q&A

There are 1 best solutions below

cybersam On 02 August 2023 at 20:33 BEST ANSWER

You should:

Move the WHERE clause right under your MATCH clause. Currently, your WHERE clause is under the OPTIONAL MATCH clause, and so the ID filtering is only done after finding the relationships of all MY_LABEL nodes.
Remove the :MY_LABEL qualification from the MATCH clause. If you already get the node by native ID, checking the label is unnecessary; and you are not using indexing.
Pass the list of IDs as a parameter. This will cause the Cypher query planner to run much faster (since the Cypher code will be simple), and once the plan is created it will be cached and reused every time you rerun the query with a new ID list. This also makes your client code simpler and faster as well.

This should be much faster:

query = f'''
    MATCH 
        (n)
    WHERE 
        ID(n) in $id_list
    OPTIONAL MATCH 
        (n) -- (u:OTHER_LABEL) // Won't always have a neighbor
    RETURN 
        ID(n) as nid, 
        n.feature1,
        u.feature2
'''
resp = graph.run(query, id_list=very_long_list_of_nids)

Also, if the relationships between MY_LABEL and OTHER_LABEL always flow in one direction, you should consider using a directional relationship pattern (either --> or <--) in your OPTIONAL MATCH clause, especially if your MY_LABEL nodes have other kinds of relationships that flow in the opposite direction.

Best way to query a large list of nodes in neo4j

There are 1 best solutions below

Related Questions in NEO4J

Related Questions in CYPHER

Related Questions in PY2NEO

Trending Questions

Popular # Hahtags

Popular Questions