Detecting invalid grammar with pyparsing

155 Views Asked by At

I have a simple grammar for evaluating logical expressions.

The "single" conditions are of the type

keyword = value
keyword != value

Then, I want to allow logical combinations grouped with parentheses, e.g.

cond1 & cond2 & cond3
cond1 & ( cond2 | cond3 )
(cond1 & cond2) | cond3
cond1 | cond2 | cond3

but not logical combinations where left/right associativity would matter and confuse the users. So the following would be disallowed:

cond1 & cond2 | cond3

If I understand it right, this latter requirement stops me from using operatorPrecedence(), and I would anyway like to understand it on a lower level.

The following grammar takes me almost where I want:

from pyparsing import Word, alphas, nums, oneOf, Literal, Group, Suppress, \
                        Forward, ZeroOrMore, ParseException, StringEnd
comparison_op_list = ['=', '!=']
logical_op_list = ['&', '|']
# Pyparsing expression for single conditions
keyword = Word(alphas + nums + '_.:-')
comparison_op = oneOf(comparison_op_list)
value = Word(alphas + nums + '_.:-;')
single_condition = keyword + comparison_op + value
# Pyparsing expression for combined condition
lpar = Literal( '(' )
rpar = Literal( ')' )
logical_op = oneOf(logical_op_list)
combined_expr = Forward()
atom = single_condition | ( Suppress(lpar) + combined_expr + Suppress(rpar) )
combined_expr << Group(atom) + ZeroOrMore( logical_op + combined_expr )

# Examples
test_strings = [
    'keyword = value',
    'keyword1 = value1 & keyword2 = value2',
    'a=a & (b=b|c=c) & (d=d & (e=e|f=f))',
    'a=a & ((b=b|(c=c))) & (((d=d) & (e=e|f=f)))',
    'test1=A & test2=B | test3 = C'  # Parses fine, but later rejected in python code
]
for s in test_strings:
    print
    print s
    print combined_expr.parseString(s)

Output:

keyword = value
[['keyword', '=', 'value']]

keyword1 = value1 & keyword2 = value2
[['keyword1', '=', 'value1'], '&', ['keyword2', '=', 'value2']]

a=a & (b=b|c=c) & (d=d & (e=e|f=f))
[['a', '=', 'a'], '&', [['b', '=', 'b'], '|', ['c', '=', 'c']], '&', [['d', '=', 'd'], '&', [['e', '=', 'e'], '|', ['f', '=', 'f']]]]

a=a & ((b=b|(c=c))) & (((d=d) & (e=e|f=f)))
[['a', '=', 'a'], '&', [[['b', '=', 'b'], '|', [['c', '=', 'c']]]], '&', [[[['d', '=', 'd']], '&', [['e', '=', 'e'], '|', ['f', '=', 'f']]]]]

test1=A & test2=B | test3 = C
[['test1', '=', 'A'], '&', ['test2', '=', 'B'], '|', ['test3', '=', 'C']]

From this I can handle valid input in regular python. The problem is that some invalid grammar is also read without error:

invalid_strings = [
    'a b = c',  # Rightfully rejected 
    'word1 = word2 != word3', # Read as 'word1 = word2'
    'keyword = value_with_*illegal*_characters' # Read as 'keyword = value_with_'
]
for s in invalid_strings:
    try:
        result = combined_expr.parseString(s)
    except ParseException:
        result = None
    print
    print s
    print result

Output:

a b = c
None

word1 = word2 != word3
[['word1', '=', 'word2']]

keyword = value_with_*illegal*_characters
[['keyword', '=', 'value_with_']]

For the above examples without logical combinations, I could probably use StringEnd to require that the string ends immediately after the expression. But that would not work when conditions are combined with logical operators(?) Is there a way to require that all parts of an input expression be recognized as part of the grammar, or would this be out of pyparsing's scope and the task of e.g. PLY?

0

There are 0 best solutions below