Noe4j : how to get property names with count of non-null or non-empty value for a given node label

105 Views Asked by At

So there is a requirement of counting the number of occurrences of non null and non empty values of all properties for a given node label.

For example,

MATCH(p:Person) 
WHERE p.name IS NOT NULL and p.name <> "" 
RETURN COUNT(p.name);

This gives me a count for name properties for node label Person I have to get the count of such other 200 properties for this node label. I am able to get the count of the properties but I am not sure for conditional count like the above. Anything would be helpful as I have just started learning neo4j.

The query I am using for all properties count is

MATCH(p:Person)
WITH p
UNWIND keys(p) as key
WITH key, COUNT(keys(p)) as cnt
RETURN DISTINCT key, cnt
ORDER BY key

What needs to be modified in the query? Or my query is wrong to start with. Thanks in advance.

2

There are 2 best solutions below

4
Finbar Good On BEST ANSWER

If you are inferring the possible properties from the Person nodes in the MATCH, then you can get the count of non-empty properties with the following:

MATCH (p:Person)
UNWIND keys(p) AS key
WITH key, CASE WHEN p[key] <> "" THEN 1 ELSE 0 END AS isNonEmpty
RETURN key, sum(isNonEmpty) AS cnt
ORDER BY key

Note that because Neo4j treats null properties as non-existent, then if none of the nodes returned by the MATCH statement has the property, those properties can't be inferred.

The CASE can easily be adapted to exclude other property values from the count. For example, to exclude empty lists, you can write:

MATCH (p:Person)
UNWIND keys(p) AS key
WITH key, 
     CASE p[key]
       WHEN [] THEN 0 
       WHEN "" THEN 0
       WHEN 0 THEN 0
       ELSE 1 
     END AS isNonEmpty
RETURN key, sum(isNonEmpty) AS cnt
ORDER BY key

If you know the properties in advance because, for example, you know the schema, then you could provide them as a list or set of records. For example, if the properties were 'a' through to 'g', you could use the following:

UNWIND ['a', 'b', 'c', 'd', 'e', 'f', 'g'] AS key
MATCH (p)
WITH key, CASE WHEN p[key] <> "" THEN 1 ELSE 0 END AS isNonEmpty
RETURN key, sum(isNonEmpty) AS cnt
ORDER BY key
2
cybersam On

[UPDATED]

Aggregating functions like COUNT already group by non-aggregating expressions like key, so DISTINCT is not needed. And comparisons to NULL are considered false by WHERE, so an explicit NULL test would be redundant.

Here is a query that uses:

  • ISEMPTY to detect all "empty" properties (with "", [], or {} values).
  • TOINTEGER to convert booleans to 1 (for true) or 0.

It will also return a 0 count for properties that are always empty.

MATCH (p:Person)
UNWIND KEYS(p) AS key
RETURN key, SUM(TOINTEGER(NOT ISEMPTY(p[key]))) AS cnt 
ORDER BY key

You should PROFILE the queries in the proposed answers to your question see which one uses the fewest DB hits.

Also, if it makes sense, you should consider removing all properties that have an empty string value if they should be considered as non-existent. That would allow you to use evern simpler and faster queries.