Gremlin / Bulbflow: How to select nodes based on their edges and related vertice's properties

741 Views Asked by At

Sorry for the long post, but I want to avoid any misunderstanding about what I'm looking for :)

I am currently discovering graph databases, and experimenting a bit with bulbflow/neo4j. Thus, I am trying to use gremlin for most of my requests, but I do not know if the request I want is even feasible or not. I may even be wrong about trying to use a graph db for such a use case, so don't mind telling me whether you think I'm on the right path or not.

First, let me provide a bit of context:

I work on an early-stage open-source project, which is a compiler for a DSL language generating C code. We are currently planning to re-write the whole thing in python for many many reasons (the language, re-designing, opening to a community and such...). The compiler includes what I'll call a cache of the compiled interfaces and templates. The interfaces describe the templates, and each template is associated to a configuration (a list of typed values associated to variables described by the interfaces).

The aim of the request I'm wishing to build is to select a single template implementation depending on an input configuration (actually used in the generation mechanism of the compiler). In the end, I want to be able to request directly through gremlin (if possible at all) a single element I'm looking for in order to provide unicity for the elements that can be found within this "cache". Currently, I manually match this configuration in the python code, but I want to know if it is feasible to do it directly within gremlin.

-

So let's define a sample graph for my use-case: We have three types of vertices:

  1. Def (Definition), contains a String property called "signature", which is actually the signature of the template defined by this node.
  2. Impl (Implementation), containing two properties which are pathes to the original source and pre-compiled files.
  3. Var (variable), containing a String property which is the signature of the variable.

Then, a few kind of edges:

  • Def -> impl_by -> Impl (multiple implementations can exist for a definition, does not contain any property)
  • Impl -> select_by -> Var (Implementations may be selected through a constraint over a configuration variable's value, each edge of this type contains actually three properties: type, value, and constraint - a comparison operator -)

The selected_by edge (or relationship, following bulflow's vocabulary) describes a selection constraint, and thus has the following properties:

  • val (value associated to the variable for the origin implementation)
  • op (comparison operator telling which kind of comparison to make for the constraint to be valid or not)

This translates as a graph such as (I'll omit the types from the selected_by edges in this graph):

              -- select_by { value="John", op="="}  ---------
              |                                              \
    (1)--Impl--- select_by { value=12, op=">"}      ------    \
    |                                                     \    \
    |                                                      \    |- Var("name")
    |         |- select_by { value="Peter", op="="} -----------/
Def (2)--Impl--                                              \/
    |         |- select_by { value=15, op="<"}      ----     /\
    |                                                   \   /  \
    |                                                    |-/----|--- Var("ver")
    (3)--Impl--- select_by { value="Kat", op="!="}  ------/    /
            |                                                 /
            |--- select_by { value=9, op=">"}       ---------/

What I want to do is to select one (or more) Impl depending on their relationship with the Vars. Let's say I have a configuration as follows:

Config 1:

variable="name", value="Peter"
variable="ver", value=16

This would select Impl(3) Since Peter != Kat AND 16 > 9, but not Impl(1) since Peter != John nor Impl(2) since 16 !< 15.

I was blocked on multiple levels, so I was starting to wonder if this was even feasible:

  • I could not find how to give such arguments (the configuration) to a gremlin script
  • I could not find how to select the Impl based on conditions over the outgoing edges.

I hope this wasn't too confusing.

Cheers, and thanks !

EDIT:

I managed to make part of my request work, by using repeatedly backtracking and filters. The request (X being the starting vertex, VALUE the value I want to match, and NAME the name of the variable to be matched) looks like this:

Basis of the request:

g.v(X).out('impl').as('implem')

Repeat this part for each couple VALUE/NAME:

.out('select_by').filter{it.value=='VAL‌​UE'}
.inV('select_by').filter{it.name=='NAME'}
.back('implem')

The only thing currently missing is that I do not know how to use the select_by edge's 'op' property to determine how to build the filter to use. For instance, thre are cases where I want to match exactly the configuration (and thus, as in this request, I ignore the 'op' property), but there are cases where I want to take the 'op' property into account, and use the associated comparator in the filters.

Is there any way to do that ? (Or should I post another question?)

0

There are 0 best solutions below