Confusion around Lark priorities

37 Views Asked by VicVic At 30 January 2024 at 16:18

I'm using lark, an excellent python parsing library.

It provides an Earley and LALR(1) parser and is defined through a custom EBNF format. (EBNF stands for Extended Backus–Naur form).

Lowercase definitions are rules, uppercase definitions are terminals. Lark also provides a weight for uppercase definitions to prioritize the matching.

I have defined a grammar but i am stuck a little and unsure why it works, and if this is implemented 'well'.

especially i dont understand why i need:

// column names
_NAME: /\_{0,2}[a-zA-Z][a-zA-Z_0-9\%]*/
NAME.-1: _NAME

to properly parse some of my formulas, such as:

SUMNAME := A+B
IF(1>0, TRUE, FALSE)

Here is the full grammar i came up with to provide users with a syntax to calculate KPIs:

?start: expr
    | NAME ":=" expr                                            -> create_statement

?expr: expr_or

?exprif: _IF "(" expr_or "," expr_or ["," expr_or] ")"          -> if_clause

?expr_or: expr_and
    | expr_and (_OR expr_and)+                                  -> or_

?expr_and: expr_cond
    | expr_cond (_AND expr_cond)+                               -> and_

?expr_cond: sum
    | sum COMPARISON sum                                        -> condition
    | sum _IN "(" list_expr ")"                                 -> is_in
    | sum _BETWEEN sum _AND sum                                 -> between
    | _NOT expr_atom                                            -> not_


?sum: product
    | sum "+" product                                           -> add
    | sum "-" product                                           -> sub

?product: division
    | product "*" division                                      -> mul

?division: power
    | power "/" power                                           -> div
    | _DIVIDE "(" expr_or "," expr_or ")"                       -> div

?power: exprfactor
    | power "**" exprfactor                                     -> pow
    | power "^" exprfactor                                      -> pow

?exprfactor: expr_atom
    | "-" expr_atom                                             -> neg

?expr_atom: atom
    | _SUM "(" list_expr ")" [_OVER"(" list_str ")"]            -> sum
    | _MEAN "(" list_expr ")" [_OVER"(" list_str ")"]           -> mean
    | _STD "(" list_expr ")" [_OVER"(" list_str ")"]            -> std
    | _MAX "(" list_expr ")"                                    -> max
    | _MIN "(" list_expr ")"                                    -> min
    | _COALESCE "(" list_expr ")"                               -> coalesce
    | _ABS "(" expr_or ")"                                      -> abs
    | _SQRT "(" expr_or ")"                                     -> sqrt
    | _FLOAT "(" expr_or ")"                                    -> float_
    | _MOD "(" expr_or "," atom ")"                             -> mod
    | _ROUND "(" expr_or "," atom ")"                           -> round
    | _YEAR "(" atom ")"                                        -> year
    | _QUARTER "(" atom ")"                                     -> quarter
    | _MONTH "(" atom ")"                                       -> month
    | _DAY "(" atom ")"                                         -> day
    | _HOUR "(" atom ")"                                        -> hour
    | _MINUTE "(" atom ")"                                      -> minute
    | _IS_NULL "(" expr_or ")"                                  -> is_null
    | _SECOND "(" atom ")"                                      -> second
    | _SUBSTR "(" expr_or "," atom ["," atom] ")"               -> substr
    | "(" expr ")"
    | exprif


?atom: NAME                                                     -> variable
    | NUMBER                                                    -> variable
    | BOOLEAN                                                   -> variable
    | STRING                                                    -> variable
    | NAN                                                       -> variable
    | NULL                                                      -> variable
    | DATETIME                                                  -> variable
    | DATE                                                      -> variable


list_expr: expr ("," expr)*
list_str: STRING ("," STRING)*


// over
_OVER: /\_{2}OVER\_{2}/

//special values
NAN: "nan"i
NULL: "null"i

// comparison operators
COMPARISON: GREATER | GREATER_OR_EQUAL | SMALLER | SMALLER_OR_EQUAL | EQUAL | UN_EQUAL
GREATER: ">"
GREATER_OR_EQUAL: GREATER"="
SMALLER: "<"
SMALLER_OR_EQUAL: SMALLER"="
EQUAL: "=="
UN_EQUAL: "!="
_BETWEEN: "between"i
_IN: "in"i

// IF token
_IF: "if"i

// logical operators.
_NOT.1: "not"i
_AND.1: "and"i
_OR.1: "or"i
_XOR.1: "xor"i

// boolean operators
_TRUE: "TRUE"
_FALSE: "FALSE"

// syntax tokens
_SUM: "sum"i
_MEAN: "mean"i
_STD: "std"i
_IS_NULL: "isnull"i | "is_null"i
_MAX: "max"i
_MIN: "min"i
_DIVIDE: "divide"i
_COALESCE: "coalesce"i
_SQRT: "sqrt"i
_FLOAT: "float"i
_MOD: "mod"i
_ROUND: "round"i
_ABS: "abs"i
_YEAR: "year"i
_QUARTER: "quarter"i
_MONTH: "month"i
_DAY: "day"i
_HOUR: "hour"i
_MINUTE: "minute"i
_SECOND: "second"i
_SUBSTR: "substr"i

// tokens for timeshifts
FREQUENCY: "M" | "Q" | "Y"
SHIFT: PLUS | MINUS
PYE: "PYE"
TIME_SHIFT: "T" SHIFT INT FREQUENCY | PYE

// operators
PLUS: "+"
MINUS: "-"

// time and date formats
DATETIME.1: "'" /\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/ "'"
DATE.1: "'" /\d{4}-\d{2}-\d{2}/ "'"

// strings
STRING : "'" _STRING_ESC_INNER "'" | "\"" _STRING_ESC_INNER "\""



// booleans
BOOLEAN: _TRUE | _FALSE

// column names
_NAME: /\_{0,2}[a-zA-Z][a-zA-Z_0-9\%]*/
NAME.-1: _NAME

%import common.INT
%import common.NUMBER
%import common.WS_INLINE
%import common._STRING_ESC_INNER

%ignore WS_INLINE

Original Q&A

Confusion around Lark priorities

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in PARSING

Related Questions in EBNF

Related Questions in LARK

Trending Questions

Popular # Hahtags

Popular Questions