I have a very large graph. My objective is the following:
Return pathways related to the target 'GIPR' and also related to compounds. Where the compounds are related to the disease 'Leukemia'
My query is the following:
MATCH (d:Disease {Name: 'Leukemia'}) CALL apoc.path.expandConfig(d, {minLevel: 1, maxLevel: 5, labelFilter: '/Compound', bfs: false})
YIELD path WITH [node in nodes(path) WHERE node:Compound] as S UNWIND S as c
CALL apoc.path.expandConfig(c, {minLevel: 1, maxLevel: 5, labelFilter: '/Pathway', bfs: false})
YIELD path WITH [node in nodes(path) WHERE node:Pathway] as A
MATCH (t:Target {Name: 'GIPR'}) CALL apoc.path.expandConfig(t, {minLevel: 1, maxLevel: 4, labelFilter: '/Pathway', bfs: false})
YIELD path WITH A, [node in nodes(path) WHERE node:Pathway] as B
WITH apoc.coll.intersection(A,B) as combined UNWIND combined as Result RETURN Result
The problem is that I keep getting repeated nodes even though the apoc.coll.intersection method should avoid that. I have tried implementing the apoc.coll.toSet method but the problem persists. If I make use of DISTINCT I would have to wait for the whole traversal to finish before the engine applies the distinction condition, that is simply not an option with the current size of the graph.
Maybe there is a way of manipulating the traversal strategy so that it avoids returning those paths that end with the same node (uniqueness condition NODE_GLOBAL would apply to all the nodes).
You can only generate globally-distinct results after all the results have been obtained. Your query is just generating locally-distinct results, which is why it is returning duplicates.
If I understand your use case, you want to intersect all
Leukemiacompound pathways withGIPRpathways. If so, your query is very inefficient because it repeatedly traverses the DB to get the same set ofGIPRpathways, when it should only be done once. Also, it needlessly scans the nodes in the paths returned byapoc.path.expandConfigfor a desired label, even though yourlabelFilters say that the desired label must only occur at the end of the path.The following query may work for you and should be faster. Note that is uses aggregation and DISTINCT to get globally-unique
AandBlists before doing a final intersection.