RegEx that matches "variable" strings/sequences? + backtracking?

135 Views Asked by At

I want to use regex like language to match against variable-string (in my case sequence of character|words|numbers stored in a graph DB).

I found a way to implement RegEx engine : https://deniskyashif.com/2019/02/17/implementing-a-regular-expression-engine/

the problem is that it matches against static string. My case is sort of what I call variable-string/sequence.

F.e. let say I have stored the following sequences :

who; why; when; where;

Keep in mind I dont have the sequences available (so that I can loop over them), they are deconstructed to a graph. (you can think of interface to the sequence like a function which given prefix predicts/returns the next character)

if I match against regex : w* it should match/return all of the strings one after another /like in backtracking/

if i use : whe* => when, where

etc..

Is there a way to modify NFA, DFA in such a way that it will accommodate variable-string ?

I just started exploring implementing NFA and think the change has to be here :

  function search(nfa, word) { .... } 

it has to be search that passes the next expected regex-symbol/state i.e. given the previous string-symbol does the next-predicted-string-symbol match the expected regex-symbol ?

The regex should 'drive' the match, rather than the string ! It should be doable because the regex is deconstructed to finite states with the transitions..

what do you think ?


they are stored as a tree in graph db...f.e.can be represented as :

 lvl5: (where:.)
 lvl4: (wher:e), (when:.), (whom:.),  
 lvl3: (whe:r), (whe:n), (who:m), (who:.), (why:.)
 lvl2: (wh:y) , (wh:o), (wh:e)
 lvl1: (w:h)
 lvl0: w h y o .
1

There are 1 best solutions below

1
On

I don’t understand your question, but this regex could be the answer:

<prefix>.*?\b

Where <prefix> is w or whe etc.

This will match all words in the input that start with the prefix.

In whatever language you’re using, there should be a way to loop over all matches found for a given input.