I found a state diagram of a DFA (deterministic finite automaton) with its RegEx in a script, but this diagram is just a sample without any explanations. So I tried to derive the RegEx from the DFA state diagram by myself and got the expression: ab+a+b(a*b)*
. I dont understand how I get the original RegEx (ab+a*)+ab+
mentioned in the script. Here my derivation:
I am grateful for any help, links, references and hints!
You've derived the regex correctly here. The expression you have
ab+a+b(a*b)*
is equivalent to(ab+a*)+ab+
- once you've finished the DFA state elimination (you have a single transition from the start state to an accepting state), there aren't any more derivations to do. You may however get different final regexes depending on the order you eliminate the states in, and they should all be valid assuming you did the eliminations correctly. The state elimination method is also not guaranteed to be able to produce all equivalent regular expressions for a particular DFA, so it's okay that you didn't arrive at exactly the original regex. You can also check the equivalence of two regular expressions here.For your particular example though to show that this DFA is equivalent to this original regex
(ab+a*)+ab+
, take a look at the DFA at this state of the elimination (somewhere in between the second and third steps you've shown above):Let's expand our expression
(ab+a*)+ab+
to(ab+a*)(ab+a*)*ab+
. So in the DFA, the first(ab+a*)
gets us from state 0 to partway between states 2 and 3 (thea*
ina*a
).Then the next part
(ab+a*)*
means we're allowed to have 0 or more copies of(ab+a*)
. If there are 0 copies, we'll just finish with anab+
, reading ana
from the second half of thea*a
transition from 2 to 3 and ab
from the 3 to 4 transition, landing us in state 4 which is accepting and where we can take the self loop and read as manyb
's as we want.Otherwise we have 1 or more copies of
(ab+a*)
, again reading ana
from the second half of thea*a
transition from 2 to 3 and ab
from the 3 to 4 transition. Thea*
comes from the first half of thea*ab
self loop on state 4 and the second halfab
is either the finalab+
of the regex or the start of another copy of(ab+a*)
. I'm not sure if there's a state elimination that arrives at exactly the expression(ab+a*)+ab+
but for what it's worth, I think the regex you derived more clearly captures the structure of this DFA.