Problems with setResultName in pyparsing

130 Views Asked by At

I have a problem with the parsing of arithmetic expressions using pyparsing. I have the following grammar:

numeric_value = (integer_format | float_format | bool_format)("value*")
identifier = Regex('[a-zA-Z_][a-zA-Z_0-9]*')("identifier*")

operand = numeric_value | identifier

expop = Literal('^')("op")
signop = oneOf('+ -')("op")
multop = oneOf('* /')("op")
plusop = oneOf('+ -')("op")
factop = Literal('!')("op")

arithmetic_expr = infixNotation(operand,
    [("!", 1, opAssoc.LEFT),
     ("^", 2, opAssoc.RIGHT),
     (signop, 1, opAssoc.RIGHT),
     (multop, 2, opAssoc.LEFT),
     (plusop, 2, opAssoc.LEFT),]
    )("expr")

I would like to use this to parse arithmetic expressions, e.g.,

expr = "9 + 2 * 3"
parse_result = arithmetic_expr.parseString(expr)

I have two problems here.

First, when I dump the result, I get the following:

[['9', '+', ['2', '*', '3']]]
- expr: ['9', '+', ['2', '*', '3']]
  - op: '+'
  - value: ['9']

The corresponding XML output ist:

<result>
  <expr>
    <value>9</value>
    <op>+</op>
    <value>
      <value>2</value>
      <op>*</op>
      <value>3</value>
    </value>
  </expr>
</result>

What I would like to have is, that ['2', '*', '3'] shows up as expr, i.e.,

<result>
  <expr>
    <value>9</value>
    <op>+</op>
    <expr>
      <value>2</value>
      <op>*</op>
      <value>3</value>
    </expr>
  </expr>
</result>

However, I am not sure ho to use the setResultName() to achieve this.

Second, unfortunately, when I want to iterate over the results, I obtain strings for the simple parts. Hence, I use the XML "hack" as a workaround (I got the idea from here: `pyparsing`: iterating over `ParsedResults` Is there a better method now?

Best regards Apo

I have one further little question on how to parse the results. My first attempt was to use a loop, like e.g.

def recurse_arithmetic_expression(tokens):
    for t in tokens:
        if t.getResultName() == "value":
            pass # do something...
        elif t.getResultName() == "identifier":
            pass # do something else..
        elif t.getResultName() == "op":
            pass # do something completely different...
        elif isinstance(t, ParseResults):
            recurse_arithmetic_expression(t)

However, unfortunately t can be a string or int/float. Hence, I get an exception when I try to call getResultName. Unfortunately, when I use asDict, the order of the tokens is lost.

Is it possible to obtain an ordered dict and iterate over its keys with something like

for tag, token in tokens.iteritems():

where tag speficies the type of the token (e.g., op, value, identifier, expr...) and token is the corresponding token?

1

There are 1 best solutions below

3
On BEST ANSWER

If you want pyparsing to convert numeric strings to integers, you can add a parse action to have that done at parse time. OR, use the predefined integer and float values defined in pyparsing_common (a namespace class imported with pyparsing):

numeric_value = (pyparsing_common.number | bool_format)("value*")

For your naming issue, you can add parse actions to get run at each level of infixNotation - in the code below, I've add a parse action that just adds the 'expr' name to the current parsed group. You'll also want to add '*' to all of your ops so that repeated operators get the same "keep all, not just the last" behavior for results names:

bool_format = oneOf("true false")
numeric_value = (pyparsing_common.number | bool_format)("value*")
identifier = Regex('[a-zA-Z_][a-zA-Z_0-9]*')("identifier*")

operand = numeric_value | identifier

expop = Literal('^')("op*")
signop = oneOf('+ -')("op*")
multop = oneOf('* /')("op*")
plusop = oneOf('+ -')("op*")
factop = Literal('!')("op*")


def add_name(s,l,t):
    t['expr'] = t[0]

arithmetic_expr = infixNotation(operand,
    [("!", 1, opAssoc.LEFT, add_name),
     ("^", 2, opAssoc.RIGHT, add_name),
     (signop, 1, opAssoc.RIGHT, add_name),
     (multop, 2, opAssoc.LEFT, add_name),
     (plusop, 2, opAssoc.LEFT, add_name),]
    )("expr")

See how these results look now:

arithmetic_expr.runTests("""
    9 + 2 * 3 * 7
""")

print(arithmetic_expr.parseString('9+2*3*7').asXML())

gives:

9 + 2 * 3 * 7
[[9, '+', [2, '*', 3, '*', 7]]]
- expr: [9, '+', [2, '*', 3, '*', 7]]
  - expr: [2, '*', 3, '*', 7]
    - op: ['*', '*']
    - value: [2, 3, 7]
  - op: ['+']
  - value: [9]


<expr>
  <expr>
    <value>9</value>
    <op>+</op>
    <expr>
      <value>2</value>
      <op>*</op>
      <value>3</value>
      <op>*</op>
      <value>7</value>
    </expr>
  </expr>
</expr>

Note: I generally discourage people from using asXML, as it has to do a fair bit of guessing to create its output. You are probably better off navigating the parsed results manually. Also, look at some of the examples on the pyparsing wiki Examples page, especially SimpleBool.py, which uses classes for the per-level parse actions used in infixNotation.

EDIT::

At this point, I really want to dissuade you from continuing on this path of using results names to guide evaluation of the parsed results. Please look at these two methods for recursing over the parsed tokens (note that the method you were looking for is getName, not getResultName):

result = arithmetic_expr.parseString('9 + 2 * 4 * 6')

def iterate_over_parsed_expr(tokens):
    for t in tokens:
        if isinstance(t, ParseResults):
            tag = t.getName()
            print(t, 'is', tag)
            iterate_over_parsed_expr(t)
        else:
            print(t, 'is', type(t))

iterate_over_parsed_expr(result)

import operator
op_map = {
    '+' : operator.add,
    '-' : operator.sub,
    '*' : operator.mul,
    '/' : operator.truediv
    }
def eval_parsed_expr(tokens):
    t = tokens
    if isinstance(t, ParseResults):
        # evaluate initial value as left-operand
        cur_value = eval_parsed_expr(t[0])
        # iterate through remaining tokens, as operator-operand pairs
        for op, operand in zip(t[1::2], t[2::2]):
            # look up the correct binary function for operator
            op_func = op_map[op]
            # evaluate function, and update cur_value with result
            cur_value = op_func(cur_value, eval_parsed_expr(operand))

        # no more tokens, return the value
        return cur_value
    else:
        # token is just a scalar int or float, just return it
        return t

print(eval_parsed_expr(result))  # gives 57, which I think is the right answer

eval_parsed_expr relies on the structure of the parsed tokens, rather than on result names. For this limited case, the tokens are all binary operators, so for each nested structure, the resulting tokens are "value [op value]...", and the values themselves could be ints, floats, or nested ParseResults - but never strs, at least not for the 4 binary operators I've hard-coded in this method. Rather than try to special-case yourself to death to handle unary ops and right-associative ops, please look at how this is done in eval_arith.py (http://pyparsing.wikispaces.com/file/view/eval_arith.py/68273277/eval_arith.py), by associating evaluator classes to each operand type, and each level of the infixNotation.