As the title says, I'm trying to parse for example
term(A, b, c(d, "e", 7))
in a Lua table like
{term, {A, b, {c, {d, "e", 7}}}}
This is the grammar I built:
local pattern = re.compile[=[
term <- variable / function
argument <- variable / lowercase /number / string
function <- {|lowercase {|(open argument (separator (argument / function))* close)?|}|}
variable <- uppercase
lowercase <- {[a-z][A-Za-z0-9]*}
uppercase <- {[A-Z][A-Za-z0-9]*}
string <- '"' {~ [^"]* ~} '"'
number <- {[0-9]+}
close <- blank ")"
open <- "(" blank
separator <- blank "," blank
blank <- " "*
]=]
I'm having the following problems:
- It can't parse nested terms. For the example above it returns only
{term, {} }
(while it's ok withterm(A, b, c)
). - To strip the quotes from the strings I used
{~ ~}
, but because of that I had to move all the captures fromargument
andterm
in the rows below. Is there a way to avoid this? - I'd like to have a key associated with each element to specify its type, for example instead of
A
something like{value = "A", type = "variable"}
. I found a way to do this with{:name: :}
but, the order of the elements in the table is lost (because it doesn't create a new table but simply adds a key, in this casevariable="A"
and the order of this elements is not fixed). How can I tag the items maintaining the order?
In your grammar you have:
Keep in mind that lpeg tries to match the patterns/predicates in the rule in the order you have it. Once it finds a match lpeg won't consider further possible matches in that grammar rule even if there could be a "better" match later on.
Here it fails to match nested function calls because it sees that
c
can matchSince your
variable
non-terminal is listed beforefunction
, lpeg doesn't consider the latter and so it stops parsing the tokens that comes after.As an experiment, I've modified your grammar slightly and added some table&named captures for most of the non-terminals you're interested in.
With a quick pattern test:
Which gives the following output on my machine:
Looking carefully at the above, you'll notice that the function arguments appear in the index part of the table in the order that they were passed in. OTOH the
type
andname
can appear in any order since it's in the associative part of the table. You can wrap those "attributes" in another table and put that inner attribute table in the index part of the outer table.Edit: Here's a revised grammar to make the parse a bit more uniform. I've removed the
term
capture to help prune some unnecessary branches.Which yields the following: