Lpeg "empty loop in rule" error

530 Views Asked by At

Can anyone provide a clear explanation and some simple examples that show this error, apparently related to match-time capture (Cmt) ?

I don't understand the only mention that I can find, which is at

http://lua-users.org/lists/lua-l/2013-06/msg00086.html

Thanks

1

There are 1 best solutions below

0
On

So this question is a bit old, but it's one of the first search results. There's not a lot on the Internet about this, and it may not be super obvious what's wrong.

The error message is a bit misleading, but what's happening - in formal PEG terms, at least as I understand them - there is a repetition operator applied to an parsing expression that can consume no input.

Or other words, LPeg has detected a loop that can match an empty string, which will never complete. There's a pure Lua implementation called LuLPeg which lacks this particular check, and if you execute your grammar it could easily enter an infinite loop.

I'm tinkering with little toy BASIC-ish language, and had the above issue with the following:

grammar = P{ "input",
  input = V"block"^0 * -1,
  block = V"expression"^0,
  -- (define expression here)
}

With that idea that the root input is an optional block of code, and a block is zero or more expressions. This is pretty simplified, of course, I'm leaving out whitespace handling and that sort of thing. But what happens when you call grammar:match("")?

  1. remaining string: "", at input. See if it matches a block.
  2. remaining string: "", at block. See if it matches an expression.
  3. remaining string: "", at expression. For the sake of time, let's say it doesn't match
  4. remaining string: "", at block. Rule concludes with zero expressions, no input consumed.
  5. remaining string: "", at input. One block consumed, check if more blocks match.
  6. remaining string: "", at block. See if it matches an expression.

And so on. Since V"block" matches the empty string, input can find an infinite number of blocks to fulfil the rule V"block"^0. In this case, there's two good solutions: Set input to at most one block, or require block to be a least one expression and wherever there could be a block set it to ^0. So either:

grammar = P{ "input", -- blocks can be empty, input contains one (empty or otherwise) block
  input = V"block" * -1,
  block = V"expression"^0,
  -- (define expression here)
}

Or:

grammar = P{ "input", -- blocks must be at least one expression, root can have one
  input = V"block"^0 * -1,
  block = V"expression"^1,
  -- (define expression here)
}

In the first case, an empty string will match a block, which fulfills the the input rule. In the second, an empty string will fail block, fulfilling the input rule with zero matching blocks.

I haven't needed to use Cmt yet, but I believe what happened was old versions of LPeg assumed the function would either fail or consume input, even if the rule inside the Cmt call could match an empty string. More recent releases don't have this assumption.