I have the automata
S'-> S
S -> a | XbY
X -> ε | aZ | Y
Y -> b | XX
z -> ab | SS
After doing one round of removing null productions i got:
S'-> S
S -> a | XbY | bY
X -> aZ | Y
Y -> b | XX | X | ε
z -> ab | SS
After doing one more round i got:
S'-> S
S -> a | XbY | bY | Xb | b
X -> aZ | Y | ε
Y -> b | XX | X
z -> ab | SS
With this i'm kind of stuck in a loop because of the X -> Y and Y -> X, what should i do to fix this?
Once you've identified a nullable non-terminal and processed it, you con't have to process it again. The usual approach is to first identify the nullable non-terminals and then process each production just once.
Note that you should binify the productions (that is, reduce the right-hand sides to at most two non-terminals) before you eliminate nullable non-terminals. Null eliminate is worst-case exponential in the size of the longest right-hand side. Binify is worst-case quadratic, and after you binify, the longest right-hand side has length 2, so eliminate nulls is now linear in the (possibly quadratically-increased) grammar size.
There's a simple linear-time algorithm for nullability detection. With each non-terminal, you associate an
is_nullable
flag, initially false (meaning unknown) and a wait-list, initially empty. With each production, you associate an index (into the right-hand side), initially 0. (Epsilon productions have nothing on the right-hand side, since that's what epsilon means.) All productions are then put onto a worklist, and you then do the following until the worklist is empty:Take a production off the worklist.
While its pointer is before a non-terminal whose is_nullable flag is true, advance the pointer.
Then one of the three following must be true:
If its pointer is at the end of the right-hand side, mark the non-terminal as nullable and add its wait-list to the worklist.
If its pointer is just before a non-terminal whose
is_nullable
flag is still false, add the production to the waitlist for that non-terminal.If its pointer is just before a terminal, do nothing.
Note that if a production is placed on the worklist more than once (in the second possibility in step 3), then its pointer will have been advanced from the previous time. Furthermore, each advance in step 2 must be at a point which has not previously been handled in step 2. So in all, each point in each right-hand side of the grammar participates in at most two constant-time steps, and the algorithm therefore has time linear in the size of the grammar (that is, the sum of the lengths of the productions).
If you've already reduced all the right-hand sides to either single terminals or at most two non-terminals (which, as mentioned above, you should do), then this algorithm is radically simplified. (The grammar might be quadratically larger, but this step still doesn't increase the time complexity.)
If you're doing this by hand, you can just mark the pointer with a red dot. The current pointer for the production is always the right-most dot, so you don't have to worry about erasing them.