Neo4j WHERE causes duplicates?

104 Views Asked by At

I'm running Neo4j Desktop v1.4.1 the db is 4.2.1 enterprise.

I have a simple graph of placements, campaigns and a placement to campaign "contains" relationship. This is a fresh dataset, every node is unique. Some placements "contain" thousands of campaigns, so I want to filter the returned campaigns by an inclusion list of campaign ids.

When I return all the matched nodes it works:

neo4j@neo4j> MATCH (:Placement {id: 5})-[:CONTAINS]->(c:Campaign)
             WHERE c.id IN [400,263,150470,25810,37578]
             RETURN *;
+--------------------------+
| c                        |
+--------------------------+
| (:Campaign {id: 37578})  |
| (:Campaign {id: 263})    |
| (:Campaign {id: 25810})  |
| (:Campaign {id: 150470}) |
+--------------------------+

When I request just the campaign:id, I get duplicates:

neo4j@neo4j> MATCH (:Placement {id: 5})-[:CONTAINS]->(c:Campaign)
             WHERE c.id IN [400,263,150470,25810,37578]
             RETURN c.id;
+--------+
| c.id   |
+--------+
| 150470 |
| 150470 |
| 150470 |
| 150470 |
+--------+

There is only one CONTAINS relationship between placement 5 and campaign 15070:

neo4j@neo4j> MATCH (:Placement {id: 5})-[rel:CONTAINS]->(:Campaign {id:150470}) 
             RETURN count(rel);
+------------+
| count(rel) |
+------------+
| 1          |
+------------+

EXPLAIN returns the following query plan, the cache[c.id] seems like it might be the culprit?

+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+
| Operator                  | Details                                                                                              | Estimated Rows | Other               |
+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +ProduceResults@neo4j     | `c.id`                                                                                               |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +Projection@neo4j         | cache[c.id] AS `c.id`                                                                                |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +Expand(Into)@neo4j       | (anon_7)-[anon_27:CONTAINS]->(c)                                                                     |              4 | Fused in Pipeline 1 |
| |                         +------------------------------------------------------------------------------------------------------+----------------+---------------------+
| +MultiNodeIndexSeek@neo4j | UNIQUE anon_7:Placement(id) WHERE id = $autoint_0, cache[c.id], UNIQUE c:Campaign(id) WHERE id IN $a |             25 | In Pipeline 0       |
|                           | utolist_1, cache[c.id]                                                                               |                |                     |
+---------------------------+------------------------------------------------------------------------------------------------------+----------------+---------------------+

Edit: if I prepend the query with CYPHER runtime=SLOTTED I get the expected output:

+--------+
| c.id   |
+--------+
| 37578  |
| 263    |
| 25810  |
| 150470 |
+--------+

If I omit the WHERE clause I get unique campaign ids (but too many). I feel like I'm missing something obvious, but I've read the neo4j docs and I'm not getting it. Thanks!

0

There are 0 best solutions below