I have loaded a hierarchical tree (DAG) of DNA SNPs. I want to identify lowest common ancestors.
This query works, yield the single correct node:
Match (n:SNPNode{SNP:'R-Z11'}), (m:SNPNode{SNP:'R-BY13828'})
match path=(n)-[:SNPParent*..99]->(MRCA)<-[:SNPParent*..99]-(m)
return MRCA.SNP
However, this one yields no result:
Match (n:SNPNode{SNP:'R-Z11'}), (m:SNPNode{SNP:'R-S25289'})
match path=(n)-[:SNPParent*..99]->(MRCA)<-[:SNPParent*..99]-(m)
return MRCA.SNP
even though the two queries seeking ancestors of both yield nodes some of which are shared:
MATCH p=(n:SNPNode{SNP:'R-Z11'})-[r:SNPParent*..66]->(m) RETURN m.SNP
m.SNP
R-Z338
R-Z8
R-Z7
R-Z2
R-Z345
R-Z27
R-Z30
R-Z9
R-L48
R-Z301
R-Z381
R-U106
R-L151
R-L51
R-L23
R-M269
R-P297
R-L389
R-L754
R-M343
and
MATCH p=(n:SNPNode{SNP:'R-Z25289'})-[r:SNPParent*..66]->(m) RETURN m.SNP
m.SNP
R-S16701
R-S1774
R-Z341
**R-Z11**
R-Z338
R-Z8
R-Z7
R-Z2
R-Z345
R-Z27
R-Z30
R-Z9
R-L48
R-Z301
R-Z381
R-U106
R-L151
R-L51
R-L23
R-M269
R-P297
R-L389
R-L754
R-M343
It seems the problem is that R-Z11 is in the path of the second query and is itself the ancestor. In other words, sometimes the LCA is at the end of a shortest path. Is there a way to address this so that R-Z11 returns as the result where or not it is in the shortest path?
I think you'll want to ensure your variable-length paths have a lower bound of 0 (when you omit the lower bound, as in your current queries, it defaults to 1). This will make it possible for the start and end nodes to be considered as possible matches to
MRCA.