I am creating a REPL for Linux commands.
Since my grammar for command is call: WS? (redirection WS)* argument (WS atom)* WS?
, once the parsing is done, I always find whitespace is included as one of the nodes in the parse tree. I understand including WS in the grammar to catch the command line correctly, but I want to filter out them after parsing.
I tried adding %ignore WS
at the end of the file, but it didn't work.
You can use a Transformer and have the method for the
WS
token returnDiscard
.Transformers make it much easier to convert the result of the parsing into the format that you need for the rest of your program. Since you didn't include your grammar, and your specific use case is too complex to replicate quickly, I'll show an example using the following basic grammar:
Before defining a transformer, we can see that all ints and spaces are present in the parsed tree:
We can define a simple transformer that only transforms
WS
:Which results in the same tree as before, but now the
WS
tokens have been removed:The transformer can be expanded further to handle more of the defined tokens:
That results in the values being proper integers, but they are still in the tree:
We can take it one step further and define a method for the rule as well - each method in a Transformer that matches a token or rule will automatically be called for each matching parsed value:
Now when we transform the tree, we get a list of ints instead of a tree:
While my example used very simple types, you could define a method for your
command
rule that returns aCommand
object, or whatever you have defined to represent it. For rules that contain other rules, the outer rules will receive the already transformed objects, just likeints
received int objects.There are also some customizations you can apply to how the transformer methods receive arguments by using the
v_args
decorator.