I have been able to find the number of subgraphs in a collection using the below query, after much trial and error and documentation consultation. It seems to work ok. It's crafted to work in the presence of cycles, though only because I happen to have cycles right now - I may not in future.
It's very ineffecient and I'm keen to get pointers on how I could better craft this query. I would ask not to suggest pregel as a response, I'm aware that there are things that could be better done there. Unless the advice is "never use AQL, always use pregel" I'm keen to learn to use AQL better and this is just an example. (EDIT1 - there is an improved query below now).
let finalArray=(
for x in components
let subResult=(
for v in 0..10 any x edges options
{"uniqueVertices":"global", "order":"bfs"}
filter is_same_collection("components", v._id)
collect keys=v._key into found
return distinct keys
)
return distinct subResult
)
return length(finalArray)
EDIT1:
after forcing myself to properly read the documentation on graph traversal again (https://www.arangodb.com/docs/3.10/aql/graphs-traversals.html) and going through the "Foundations of Graph Databases" course on the arangodb website, I've made some functional fixes to the query, below.
The problem in the previous query was that paths were found via vertices in other collections and this was in effect a bug not cleaned up by the filter expression - explicitly only permitting paths to use vertices from the "components" collection fixed this.
Still, I doubt this is performance optimal:
let finalArray=(
for x in components
let subResult=(
for v,p in 0..20 any x edges options
{"uniqueVertices":"global", "order":"bfs",
"vertexCollections":"components"}
collect keys=v._key
return distinct keys
)
return distinct subResult
)
return length(finalArray)
Possibly Superfluous Project Context:
I have a 2-3 weeks old hobby project to test out some ideas that I'd like to later apply to a real, fairly large-scale, production environment. I'm new to both graph programming and arangodb. It seems important to get some good practice and efficiency in my project at the ground floor.
My project in its test form has a handful of document collections and one edges collection. There is one collection substantially larger than the others. It has 1000 elements in my test program, but I want to scale that to 1 million. Query performance is the determining factor to how usable this system will be - obviously it's not the biggest graph in history - But I'll need to be able to run a series of queries across the whole graph before each change to the graph (representing a large, heterogeneous, tightly-coupled application environment and its deployment plane) - so it's not real-time, but it (the series of queries) has to converge in the time that a human operator is itching to make a potentially-critical change to the environment.
I tried this query, it works, but I don't know what I don't know about making it better.