I'm new to arangoDB with graphs. I simply want to know if it is faster to build edges or use 'DOCUMENT()' for very simple 1:1 connections where a querying the graph is not needed?
LET a = DOCUMENT(@from)
FOR v IN OUTBOUND a
CollectionAHasCollectionB
RETURN MERGE(a,{b:v})
vs
LET a = DOCUMENT(@from)
RETURN MERGE(a,{b:DOCUMENT(a.bId)}
A simple benchmark you can try:
Create the collections
products
,categories
and an edge collectionhas_category
. Then generate some sample data:Then compare the query times for the different approaches.
Graph traversal (depth 1..1):
Look-up in categories collection using DOCUMENT():
Using the directly embedded category names:
Graph traversal is the slowest of all 3, the lookup in another collection is faster than the traversal, but the by far fastest query is the one with embedded category names.
If you query the categories for just one or a few products however, the response times should be in the sub-millisecond area regardless of the data model and query approach and therefore not pose a performance problem.
The graph approach should be chosen if you need to query for paths with variable depth, long paths, shortest path etc. For your use case, it is not necessary. Whether the embedded approach is suitable or not is something you need to decide:
Is it acceptable to duplicate information, and potentially have inconsistencies in the data? (If you want to change the category name, you need to change it in all product records instead of just one category document, that products can refer to via the immutable ID)
Is there a lot of additional information per category? If so, all that data needs to be embedded into every product document that has that category - basically trading memory / storage space for performance
Do you need to retrieve a list of all (distinct) categories often? You can do this type of query really cheap with the separate categories collection. With the embedded approach, it will be much less efficient, because you need to go over all products and collect the category info.
Bottom line: you should choose the data model and approach that fits your use case best. Thanks to ArangoDB's multi-model nature you can easily try another approach if your use case changes or you run into performance issues.