Sorry for the long post, but I want to avoid any misunderstanding about what I'm looking for :)
I am currently discovering graph databases, and experimenting a bit with bulbflow/neo4j. Thus, I am trying to use gremlin for most of my requests, but I do not know if the request I want is even feasible or not. I may even be wrong about trying to use a graph db for such a use case, so don't mind telling me whether you think I'm on the right path or not.
First, let me provide a bit of context:
I work on an early-stage open-source project, which is a compiler for a DSL language generating C code. We are currently planning to re-write the whole thing in python for many many reasons (the language, re-designing, opening to a community and such...). The compiler includes what I'll call a cache of the compiled interfaces and templates. The interfaces describe the templates, and each template is associated to a configuration (a list of typed values associated to variables described by the interfaces).
The aim of the request I'm wishing to build is to select a single template implementation depending on an input configuration (actually used in the generation mechanism of the compiler). In the end, I want to be able to request directly through gremlin (if possible at all) a single element I'm looking for in order to provide unicity for the elements that can be found within this "cache". Currently, I manually match this configuration in the python code, but I want to know if it is feasible to do it directly within gremlin.
-
So let's define a sample graph for my use-case: We have three types of vertices:
- Def (Definition), contains a String property called "signature", which is actually the signature of the template defined by this node.
- Impl (Implementation), containing two properties which are pathes to the original source and pre-compiled files.
- Var (variable), containing a String property which is the signature of the variable.
Then, a few kind of edges:
- Def -> impl_by -> Impl (multiple implementations can exist for a definition, does not contain any property)
- Impl -> select_by -> Var (Implementations may be selected through a constraint over a configuration variable's value, each edge of this type contains actually three properties: type, value, and constraint - a comparison operator -)
The selected_by edge (or relationship, following bulflow's vocabulary) describes a selection constraint, and thus has the following properties:
- val (value associated to the variable for the origin implementation)
- op (comparison operator telling which kind of comparison to make for the constraint to be valid or not)
This translates as a graph such as (I'll omit the types from the selected_by edges in this graph):
-- select_by { value="John", op="="} ---------
| \
(1)--Impl--- select_by { value=12, op=">"} ------ \
| \ \
| \ |- Var("name")
| |- select_by { value="Peter", op="="} -----------/
Def (2)--Impl-- \/
| |- select_by { value=15, op="<"} ---- /\
| \ / \
| |-/----|--- Var("ver")
(3)--Impl--- select_by { value="Kat", op="!="} ------/ /
| /
|--- select_by { value=9, op=">"} ---------/
What I want to do is to select one (or more) Impl depending on their relationship with the Vars. Let's say I have a configuration as follows:
Config 1:
variable="name", value="Peter"
variable="ver", value=16
This would select Impl(3) Since Peter != Kat AND 16 > 9, but not Impl(1) since Peter != John nor Impl(2) since 16 !< 15.
I was blocked on multiple levels, so I was starting to wonder if this was even feasible:
- I could not find how to give such arguments (the configuration) to a gremlin script
- I could not find how to select the Impl based on conditions over the outgoing edges.
I hope this wasn't too confusing.
Cheers, and thanks !
EDIT:
I managed to make part of my request work, by using repeatedly backtracking and filters. The request (X being the starting vertex, VALUE the value I want to match, and NAME the name of the variable to be matched) looks like this:
Basis of the request:
g.v(X).out('impl').as('implem')
Repeat this part for each couple VALUE/NAME:
.out('select_by').filter{it.value=='VALUE'}
.inV('select_by').filter{it.name=='NAME'}
.back('implem')
The only thing currently missing is that I do not know how to use the select_by edge's 'op' property to determine how to build the filter to use. For instance, thre are cases where I want to match exactly the configuration (and thus, as in this request, I ignore the 'op' property), but there are cases where I want to take the 'op' property into account, and use the associated comparator in the filters.
Is there any way to do that ? (Or should I post another question?)