Vertex centric index not used by Arango

32 Views Asked by At

Given the following graph; Graph

Two collections exist;

  • The nodes collection - No values here
  • The edges edge collection - single boolean value - v

I have a persistent index on edges with the following fields: _to, v

Starting from nodes/a, I would like to retrieve all paths that have at least one edge with edge.v == true

Currently, I achieve it with the following query:

for v, e, p in 1..5 inbound "nodes/a" edges
    FILTER p.edges[*].v ANY == true
    return p

The query works, however, Arango is refusing to use the persistent vertex-centric index I have set.

Profile:

Query String (94 chars, cacheable: false):
 for v, e, p in 1..5 inbound "nodes/a" edges
     FILTER p.edges[*].v ANY == true
     return p

Execution plan:
 Id   NodeType          Calls   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode         1       1          0       0.00001   * ROOT
  2   TraversalNode         1       9          0       0.00162     - FOR v  /* vertex */, p  /* paths: vertices, edges */ IN 1..5  /* min..maxPathDepth */ INBOUND 'nodes/a' /* startnode */  edges
  3   CalculationNode       1       9          0       0.00014       - LET #7 = (p.`edges`[*].`v` any == true)   /* simple expression */
  4   FilterNode            1       6          3       0.00009       - FILTER #7
  5   ReturnNode            1       6          0       0.00002       - RETURN p

Indexes used:
 By   Name   Type   Collection   Unique   Sparse   Cache   Selectivity   Fields      Stored values   Ranges
  2   edge   edge   edges        false    false    false       77.78 %   [ `_to` ]   [  ]            base INBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions
 2   1..5                       edges             uniqueVertices: none, uniqueEdges: path                           

Optimization rules applied:
 Id   RuleName
  1   optimize-traversals

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Cache Hits/Misses   Filtered   Peak Mem [b]   Exec Time [s]
           0            0           0           45              10 / 0          3          32768         0.00378

Query Profile:
 Query Stage               Duration [s]
 initializing                   0.00000
 parsing                        0.00011
 optimizing ast                 0.00000
 loading collections            0.00001
 instantiating plan             0.00019
 optimizing plan                0.00101
 instantiating executors        0.00052
 executing                      0.00194
 finalizing                     0.00005

It is worth noting that if I change ANY to ALL, the index is used.

In a real life scenario, nodes/a is a supernode with potentially hundreds of thousands of edges at the first hop, and this query takes a long time to execute.

How can I optimize this query to use an index?

0

There are 0 best solutions below