Cypher COLLECT makes UNWIND unwind in wrong order

2.3k Views Asked by At

Graph gist: http://gist.neo4j.org/?6182d024325343760cb4

I want to get a (longest) path in order and it works as expected until I add a COLLECT statement, is there something concerning Cypher and COLLECT that I just don't understand or is this a bug?

This query works as expected, returns the nodes in path in the correct order:

MATCH (n:Cable { name: 'Cable3' })-[:Connected_to*]-(port:Port)
OPTIONAL MATCH path=(port)-[:Connected_to*]-()
WITH nodes(path) AS parts, length(path) AS len
ORDER BY len DESC 
LIMIT 1 UNWIND parts AS part
RETURN part

This one, without the COLLECT statement, returns the nodes in the right order but also the nodes between the part and the parent (as expected).

MATCH (n:Cable { name: 'Cable3' })-[:Connected_to*]-(port:Port)
OPTIONAL MATCH path=(port)-[:Connected_to*]-()
WITH nodes(path) AS parts, length(path) AS len
ORDER BY len DESC
LIMIT 1 UNWIND parts AS part
OPTIONAL MATCH (part)<-[:Has*1..10]-(parent)
RETURN part, parent

This query does not work as expected, returns the nodes in the path in another order:

MATCH (n:Cable { name: 'Cable3' })-[:Connected_to*]-(port:Port)
OPTIONAL MATCH path=(port)-[:Connected_to*]-()
WITH nodes(path) AS parts, length(path) AS len
ORDER BY len DESC
LIMIT 1 UNWIND parts AS part
OPTIONAL MATCH (part)<-[:Has*1..10]-(parent)
RETURN part, LAST(COLLECT(parent))

Any insight would be appreciated.

1

There are 1 best solutions below

0
On BEST ANSWER

The UNWIND is actually still returning rows in the expected order. Instead, it is the aggregation function, COLLECT(), that is "rearranging" the result rows. neo4j does not guarantee that the result rows of an aggregation function will be in any particular order (without an ORDER BY clause).

Here is a workaround that avoids using aggregation. It might work for your specific use case, depending on your requirements. This query filters the second OPTIONAL MATCH so that it contains the longest possible sequence of Has relationships (as long as the longest sequence is <= 10 hops -- you should adjust this as needed, or consider eliminating the maximum). The query simply returns the most remote "ancestor" in each result row, and the rows will remain in the order your expect.

MATCH (n:Cable { name: 'Cable3' })-[:Connected_to*]-(port:Port)
OPTIONAL MATCH path=(port)-[:Connected_to*]-()
WITH nodes(path) AS parts, length(path) AS len
ORDER BY len DESC 
LIMIT 1 UNWIND parts AS part
OPTIONAL MATCH (part)<-[:Has*1..10]-(ancestor)
WHERE NOT (ancestor)<-[:Has]-()
RETURN part, ancestor;