Logical node deletion in Neo4j embedded

110 Views Asked by At

I have the following graph in an embedded Neo4j instance:

enter image description here

I want to find all the people who are not greeted by anyone else. That's simple enough: MATCH (n) WHERE NOT ()-[:GREETS]->(n) RETURN n.

However, whenever I find non-greeted people, I want to remove those node from the db and repeat the query, as long as it matches one or more nodes. In other words, starting from the graph in the picture, I want to:

  1. Run the query, which returns "Alice"
  2. Remove "Alice" from the db
  3. Run the query, which returns "Bob"
  4. Remove "Bob" from the db
  5. Run the query, which returns no matches
  6. Return the names "Alice" and "Bob"

Moreover, I want to execute this algorithm without actually removing any nodes from the database - i.e., a sort of "logical deletion".

One solution I have found is to not call success() on the transaction, so that node deletions are not committed to the db, as in the following code:

import org.neo4j.graphdb.*;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;

import java.io.File;
import java.util.*;

public class App 
{
    static String dbPath = "~/neo4j/data/databases/graph.db";

    private enum RelTypes implements RelationshipType { GREETS }

    public static void main(String[] args) {
        File graphDirectory = new File(dbPath);
        GraphDatabaseService graph = new GraphDatabaseFactory().newEmbeddedDatabase(graphDirectory);

        Set<String> notGreeted = new HashSet<>();

        try (Transaction tx = graph.beginTx()) {
            while (true) {
                Node notGreetedNode = getFirstNode(graph, "MATCH (n) WHERE NOT ()-[:GREETS]->(n) RETURN n");
                if (notGreetedNode == null) {
                    break;
                }
                notGreeted.add((String) notGreetedNode.getProperty("name"));
                detachDeleteNode(graph, notGreetedNode);
            }

            // Here I do NOT call tx.success()
        }

        System.out.println("Non greeted people: " + String.join(", ", notGreeted));

        graph.shutdown();
    }

    private static Node getFirstNode(GraphDatabaseService graph, String cypherQuery) {
        try (Result r = graph.execute(cypherQuery)) {
            if (!r.hasNext()) {
                return null;
            }

            Collection<Object> nodes = r.next().values();

            if (nodes.size() == 0) {
                return null;
            }

            return (Node) nodes.iterator().next();
        }
    }

    private static boolean detachDeleteNode(GraphDatabaseService graph, Node node) {
        final String query = String.format("MATCH (n) WHERE ID(n) = %s DETACH DELETE n", node.getId());

        try (Result r = graph.execute(query)) {
            return true;
        }
    }
}

This code works correctly and prints "Non greeted people: Bob, Alice".

My question is: does this approach (i.e. keeping a series of db operations within an open transaction) have any drawbacks that I should be aware of (e.g. potential memory issues)? Are there other approaches I could use to accomplish this?

I have also considered using a boolean property on the nodes to mark them as either deleted or not deleted. My concern is that the actual application I'm working on contains thousands of nodes and various kinds of relationships, and the actual queries are non-trivial, so I'd rather not change them to accommodate a soft-deletion boolean property (but I am open to doing that, if that turns out to be the best approach).

Also, please note that I am not simply looking for nodes that are not in cycles. Rather, the underlying idea is as follows. I have some nodes that satisfy a certain condition c; I want to (logically) remote those nodes; this will potentially make new nodes satisfy the same condition c, and so on, until the set of nodes that satisfy c is empty.

0

There are 0 best solutions below