Neo4j lowest common ancestor node not found

245 Views Asked by At

I have loaded a hierarchical tree (DAG) of DNA SNPs. I want to identify lowest common ancestors.

This query works, yield the single correct node:

Match (n:SNPNode{SNP:'R-Z11'}), (m:SNPNode{SNP:'R-BY13828'})
match path=(n)-[:SNPParent*..99]->(MRCA)<-[:SNPParent*..99]-(m) 
return MRCA.SNP

However, this one yields no result:

Match (n:SNPNode{SNP:'R-Z11'}), (m:SNPNode{SNP:'R-S25289'})
match path=(n)-[:SNPParent*..99]->(MRCA)<-[:SNPParent*..99]-(m) 
return MRCA.SNP

even though the two queries seeking ancestors of both yield nodes some of which are shared:

MATCH p=(n:SNPNode{SNP:'R-Z11'})-[r:SNPParent*..66]->(m) RETURN m.SNP

m.SNP
R-Z338
R-Z8
R-Z7
R-Z2
R-Z345
R-Z27
R-Z30
R-Z9
R-L48
R-Z301
R-Z381
R-U106
R-L151
R-L51
R-L23
R-M269
R-P297
R-L389
R-L754
R-M343

and

MATCH p=(n:SNPNode{SNP:'R-Z25289'})-[r:SNPParent*..66]->(m) RETURN m.SNP

m.SNP
R-S16701
R-S1774
R-Z341
**R-Z11**
R-Z338
R-Z8
R-Z7
R-Z2
R-Z345
R-Z27
R-Z30
R-Z9
R-L48
R-Z301
R-Z381
R-U106
R-L151
R-L51
R-L23
R-M269
R-P297
R-L389
R-L754
R-M343

It seems the problem is that R-Z11 is in the path of the second query and is itself the ancestor. In other words, sometimes the LCA is at the end of a shortest path. Is there a way to address this so that R-Z11 returns as the result where or not it is in the shortest path?

2

There are 2 best solutions below

0
On

Here is the query that works:

match p=(n:SNPNode{SNP:'R-Z11'})<-[:SNPChild*0..99]-(MRCA:SNPNode)-[:SNPChild*0..99]->(m:SNPNode{SNP:'R-BY13828'}) 
return MRCA.SNP

Or, to get the lowest common ancestor (MRCA) with a boolean flag:

match p=(n:SNPNode{SNP:'R-Z11'})<-[:SNPChild*0..99]-(MRCA:SNPNode)-[:SNPChild*0..99]->(m:SNPNode{SNP:'R-BY13828'}) unwind(nodes(p)) as pn
return case when pn.SNP=MRCA.SNP then True else False end as MRCA,pn.SNP

with this output

MRCA SNP

FALSE R-Z11

FALSE R-Z338

TRUE R-Z8

FALSE R-BY13828

1
On

I think you'll want to ensure your variable-length paths have a lower bound of 0 (when you omit the lower bound, as in your current queries, it defaults to 1). This will make it possible for the start and end nodes to be considered as possible matches to MRCA.

Match (n:SNPNode{SNP:'R-Z11'}), (m:SNPNode{SNP:'R-S25289'})
match path=(n)-[:SNPParent*0..99]->(MRCA)<-[:SNPParent*0..99]-(m) 
return MRCA.SNP