Neo4j similarity of single node with entire graph

384 Views Asked by At

I'm trying to use gds in neo4j do calculate similarities. I understand how to get gds to calculate all the similarities in the in memory graph, but really this answer the question "Tell me, over the whole graph, the similarity of each pair of nodes." Now my question is different, my question is "Given this node N, give me the similarity of N with every other node". Obviously the performance of the latter would be much faster.. I tried to express this with a query of this type:

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name

But what is really happening under the hood? Is gds : A) calculating ALL the similarities between every node1 and node2 and then filtering the results only for Chair1? OR B) Is gds ONLY calculating the results between Chair1 and every other node? I'd need behaviour B to happen for me, but after some testing with the airport databases it seems that the execution time is shorter without the WHERE clause than with, so my nose tells me that it may be behaviour A. Is there a way to force behaviour B?

1

There are 1 best solutions below

0
On BEST ANSWER

As commented by a Neo4j developer, as of now for the above code snippet, GDS is calculating all the similarities and post-filtering the results (the WHERE is applied to the result stream from the node similarity algorithm).

More sophisticated filters are going to be released with version 2.1, but in the meanwhile this answer may clarify the behaviour for some people. Cheers!