I am trying to use pyparsing to parse the label used for DOT records. The syntax is:
rlabel → '{' field ( '|' field )* '}'
field → boxLabel | rlabel
boxLabel → [ ’<’ string ’>’ ] [ string ]
I have two issues. One, I am not sure how to handle boxLabel because it has to optionals. Two, I think I have a left recursion but since there is no "operator" not sure how to deal with it. What I've done so far:
langle = pyparsing.Literal('<')
rangle = pyparsing.Literal('>')
string = pyparsing.Word('alphanums')
port_id = pyparsing.Group(langle + string + rangle)
port_name = pyparsing.Group(string)
box_label = pyparsing.Group(pyparsing.Optional(port_id) + pyparsing.Optional(port_name))
rlabel = pyparsing.Forward()
field = box_label | rlabel
rlabel_content = pyparsing.ZeroOrMore(field + "|") + field
rlabel << pyparsing.nestedExpr(opener='{', content=rlabel_content, closer='}')
Parsing goes into infinite recursion.
Further,
>> print(box_label.parseString("<h> root"))
[[['<', 'h', '>']]]
and I have no idea why the port_name parsing is not present in the results.
Congrats on making this much progress on your pyparsing project, recursive grammars are tricky to get your head around the first time you do them.
First, a minor typo, but one which is probably puzzling you in that it creates a valid parser, but a strangely limited one.
should be
When you pass a string to
Word
, that defines the list of valid characters to use when parsing such a character group. SoWord("0123456789")
will parse an integer,Word("AB")
will parse any word made up of the letters "A" and "B".Word('alphanums')
will parse "alpha", "nums", "plums", "haphalump" and many other words, as long as they are made up of the letters in 'alphanums'. I'm pretty sure you want the pyparsing-defined stringpyparsing.alphanums
, which includes the characters 'A'-'Z', 'a'-'z', and '0'-'9'.Let's tackle the
box_label
question first. In general, you can implement "A and B or just A or just B" as:Or if you want to be more explicit, you can do:
But this starts to get hairy when you have to work with more than 2 expressions. Fortunately, you only have 2, and I chose the first format for my proposed solution below.
Part of the problem you are having with recursion is that you are using both
Forward
andnestedExpr
- usually just one or the other is sufficient. But in your case, there is an advantage to usingForward
, so I'll use that method.(I've also imported pyparsing as
pp
, to cut down on all the extra chars.)Create yourself a set of test strings, and call
runTests
with them to see how your parser behaves:runTests
will try to parse each given line, and then either dump the results, or show the parse exception: