How to approach graph modeling using Cypher - should I use property or node?

48 Views Asked by At

I have books and authors. In SQL, I would have two tables and then I would create relations between them.

How does this work in the graph world? Should books and authors be separate nodes or are authors just additional node properties?

I've come up with the following code, but I'm not sure if it is redundant. I added authors both to Book node, and I've created a relationship to Author node.

CREATE 
(b1:Book {title: "The Catcher in the Rye", author: "J.D. Salinger"}),
(b2:Book {title: "The Great Gatsby", author: "F. Scott Fitzgerald"}),
(b3:Book {title: "The Old Man and the Sea", author: "Ernest Hemingway"}),
(b4:Book {title: "For Whom The Bell Tolls", author: "Ernest Hemingway"}),
(a1:Author {name: "J.D. Salinger"}),
(a2:Author {name: "F. Scott Fitzgerald"}),
(a3:Author {name: "Ernest Hemingway"}),
(a1)-[:WROTE]->(b1),
(a1)-[:WROTE]->(b2),
(a3)-[:WROTE]->(b3),
(a3)-[:WROTE]->(b4)

Is adding authors to Book nodes redundant?

1

There are 1 best solutions below

0
On BEST ANSWER

The answer to this question in some ways is "it depends".

I think in general, you would start by not including the author names as properties, so you would just build a structure along the lines of:

(:Author {name: "x"})-[:WROTE]->(:Book {title: "y"})

I have seen cases where it makes sense to store the information also as a property to avoid additional dereferences, but in general I would start with the structure above and only resort to having the information in multiple places if some very good reason arises.

The reasons tend to be unique to particular implementations. In terms of general graph data modeling I would start with the simple "Author wrote book" structure. That also has the advantage of only having to maintain one version of the information (in this case the edge between the author and the book)