Neo4j count relationships of each type for given node

83 Views Asked by At

I want to count the relationships for each type of a given start node. I have constructed two possible queries to achieve that, but I don't know which one is going to be more efficient when dealing with lots of relationships.

  1. Cypher only query with count()
MATCH (n) WHERE id(n) = 0
CALL {
    WITH n
    MATCH (n)<-[r]-()
    RETURN '<'+TYPE(r) AS type, COUNT(r) AS count
UNION ALL 
    WITH n
    MATCH (n)-[r]->()
    RETURN TYPE(r)+'>' AS type, COUNT(r) AS count
}
RETURN type, count

Result:

╒════════════╤═════╕
│type        │count│
╞════════════╪═════╡
│"<ACTED_IN" │5    │
├────────────┼─────┤
│"<PRODUCED" │1    │
├────────────┼─────┤
│"<DIRECTED" │2    │
└────────────┴─────┘
  1. Cypher + APOC apoc.node.relationship.types() and type, apoc.node.degree.[in|out]()
MATCH (n) WHERE id(n) = 0
WITH n, apoc.node.relationship.types(n) AS types
CALL {
    WITH n, types
    UNWIND types as type
    RETURN '<'+type AS type, apoc.node.degree.in(n, type) as count
UNION ALL 
    WITH n, types
    UNWIND types as type
    RETURN type+'>' AS type, apoc.node.degree.out(n, type) as count
}
RETURN type, count

Result:

╒════════════╤═════╕
│type        │count│
╞════════════╪═════╡
│"<ACTED_IN" │5    │
├────────────┼─────┤
│"<DIRECTED" │2    │
├────────────┼─────┤
│"<PRODUCED" │1    │
├────────────┼─────┤
│"ACTED_IN>" │0    │
├────────────┼─────┤
│"DIRECTED>" │0    │
├────────────┼─────┤
│"PRODUCED>" │0    │
└────────────┴─────┘

The second query returns rows for empty relationship types, but this can be neglected.

I can only profile the first cypher-only query, because custom procedures like APOC can't be profiled.

2

There are 2 best solutions below

2
On BEST ANSWER

There is actually a faster approach that also fixes a potential problem in your current queries. If n has any self-relationships (relationships that start/end at n), then such relationships would be counted twice (as both inbound and outbound relationships). In other words, the sum of the counts could be greater than the actual number of relationships.

This query should be fast and also solve the self-relationship problem:

MATCH (n)-[r]-() WHERE id(n) = 0
RETURN
  CASE n WHEN ENDNODE(r) THEN '<' ELSE '' END +
  TYPE(r) +
  CASE n WHEN STARTNODE(r) THEN '>' ELSE '' END AS type,
  COUNT(*) AS count

An inbound REL relationship would be represented as <REL, an outbound one as REL>, and a self-relationship as <REL>. And the sum of the counts would equal the actual number of relationships.

Including endnode labels

Here is a slightly altered query that returns a count of each distinct combination of type and end node labels (a node can have multiple labels):

MATCH (n)-[r]-(m) WHERE id(n) = 0
RETURN
  CASE n WHEN ENDNODE(r) THEN '<' ELSE '' END +
  TYPE(r) +
  CASE n WHEN STARTNODE(r) THEN '>' ELSE '' END AS type,
  LABELS(m) AS endNodeLabels,
  COUNT(*) AS count

Reading how aggregating functions work will help you understand these 2 queries.

5
On

Because you're only counting relationships for a single node, either version will take milliseconds, so performance isn't relevant. Only where counts were being done for many nodes would you consider it. In any case, without profiling, the best way would be to actually measure it.

I compared the following Cypher and APOC versions for different numbers of relationships (the actual MATCH double counts, but that doesn't matter for this):

Cypher

MATCH (n)-[r]-()
RETURN 
  CASE 
    WHEN (startNode(r) = n) THEN type(r) + '>' 
    ELSE '<' + type(r)
  END AS type, 
  count(r) AS count

APOC

MATCH (n)
CALL {
    WITH n
    WITH n, apoc.node.relationship.types(n) AS types
    CALL {
        WITH n, types
        UNWIND types as type
        RETURN '<'+type AS type, apoc.node.degree.in(n, type) as count
    UNION ALL 
        WITH n, types
        UNWIND types as type
        RETURN type+'>' AS type, apoc.node.degree.out(n, type) as count
    }
    RETURN type, count
}
RETURN type, sum(count)

Here is a comparison (based on running the DB locally):

enter image description here

There may be a more efficient version for the APOC query, but in any case, the Cypher here is faster.