How to recognize a context free grammar with a rust declarative macro

47 Views Asked by At

While I was implementing a regex tool in Rust, I tried to create a fancy macro to write regex:

regex!['a' 'b'? ('c' | 'd')*] // ab?(c|d)*

But only by using the macro_rules, no procedural macro. I figured out that a stack is necessary as this is a context-free grammar, not a regular grammar. I managed to get the following language recognized:

expr
  = '1'
  | '*' expr
  | '+' expr expr

The stack just stores . to know how much expressions have to be ignored

macro_rules! parse {
    // empty stack, parse the expression
    ([] 1 $($t:tt)*) => { val() };
    ([] * $($t:tt)*) => { uni(parse!([] $($t)*)) };
    ([] + $($t:tt)*) => { bin(parse!([] $($t)*), parse!([.] $($t)*)) };
    // not empty stack, ignore the first expression
    ([. $($o:tt)*] 1 $($t:tt)*) => { parse!([$($o)*] $($t)*) };
    ([. $($o:tt)*] * $($t:tt)*) => { parse!([. $($o)*] $($t)*) };
    ([. $($o:tt)*] + $($t:tt)*) => { parse!([. . $($o)*] $($t)*) };
}
macro_rules! expr {
    ($($t:tt)*) => { parse!([] $($t)*) };
}
let e = expr!(+ *1 1);
let e = bin(uni(val()), val());

The macro suffers from a high complexity, it parses the same expression multiple time, a little bit like a recursive Fibonacci.

At this point, it would be better to produce an array of token and to use a proper parser to produce the desired tree. But this is for the art and to push the limit of macro_rules. Please, see it as a pure curiosity thing.

Is it possible to recognize infix operators?

Is it possible to handle precedence?

Is it possible to recognize the "follow" operator (when two expressions are side by side 1 1) ?

Is it possible to avoid parsing more than once the same expression?

Can we take advantage of different stack elements than just a . ?

0

There are 0 best solutions below