I’m in a situation that, using ANTLR, I’m trying to parse input files that contains references to other files inside them, just like #include "[insert file name]"
of C language.
One suggested approach is:
- Parse the root file, saving said references as nodes (so, specific Grammar rules)
- Visit the tree searching for "reference" nodes
- for each reference node, parse the file referenced and substitute the node with the newly generated tree
- repeat this process recursively, to handle multiple levels of inclusions
The problem with this solution is that the referenced files might be completely partial (see includes inside the body of a C function). In order to parse such files, I would have to implement a different parser to handle the fragmented grammar.
Is there any valid/suggested approach to (literally) inject the new file inside the ongoing parsing process?
One solution to this problem can be achieved by overriding Scanner's behavior and specifically, the
NextToken()
method. This is necassary since the EOF token cannot be handled by the ANTLR lexer grammar ( to my best knowledge ) and any actions attached to the lexer rule recognizing the EOF are simply ignored (as shown in the code bellow). Thus, it is necessary to implement this behaviour directly into the scanner method.So assume we have a parser grammar
A
static public Stack<ICharStream>
(i.e.mySpecialFileStack
) should be introduced inside grammar's members. This stack will be used to store the Character Steams associated with the files that take part in the parsing. The Character Streams are push to this stack as new files are encountered with the include statementsand a lexer grammar
The overriden body of NextToken() method will be placed in the .g4.cs file which purpose is to extend the generated scanner class given that the generated scanner class is decorated with the "partial" keyword
After the partial Scanner Class associated to the given grammar is generated navigate to the source code of the ANTLR4 Lexer Class as given in the ANTLR Runtime and Copy ALL the original code to this new method and, in the middle do-while block (right after the try-catch block) add the following code:
The full body of the NextToken() method override is
Now, when you recognize a file inside your code that should be parsed, simply add the following action
Finally the main program is given below where it is apparent that the root file is added first in the ICharStream stack