Generate output along with recursive descent parser

1.5k Views Asked by At

Writing simple parsers are trivial and I have implemented several over the years. In college, we also had to write one. But we never had to generate meaningful output using this method; we never learned how to create a back end.

If I have a working recursive descent parser for a simplified Pascal and I wanted to translate the code to C++, how would I go about doing this? I dont see a need for an intermediate step such as generating an abstract syntax tree.

So how do I go about outputting the compiled or translated code? The only useful example I have found of this is in Jack Crenshaw's tutorial but it focuses more on the front end, as does most other resources. The relationship between my parser code and my grammar is very obvious. What about the relationship between the parsers methods and the outputting? My parser method for a function declaration can relate to exactly one EmitLn(C++ code here) call. But what about parser methods that aren't that easy, such as expressions. Expressions are broken down to possibly many more calls, so that alludes to the need of a Emit() function that allows me to break down the output code for an expression piece by piece. Is there any boiler plate code for outputting code, such as the EmitLn function in Jack Crenshaw's Lets Build a Compiler? This also shows me I need to maintain a basic symbol table, another thing that is often omitted in most examples.

Am I right? What else should I know? Any tips/suggestions, or resources? My big question is, there are so many tutorials out there for compiler front ends, but how about some explanation on the back end? I can parse languages, now I want to evolve that into being able to translate and compile them into other languages.

2

There are 2 best solutions below

2
On BEST ANSWER

There are trivial compilers using "on the fly" code generators that spit code as they parse, as you seem to be trying to do. The good news is that they look easy.

The problem is that compilation is fundamentally about spreading information from one part of the code to another, to enable good code generation. For that, compiler people long ago decided to separate parsing from code generation. The parser parses the code and builds another intermediate data structure (often an abstract syntax tree or triples) that represents the program.

Often, very deep analysis is then lavished on the intermediate structure to determine how to generate good code, followed by appropriate good code generation. The techniques for doing this well are complex, but well motivated by the need to produce good (and correct) code.

Try checking out the standard compiler resources. Learning to write a compiler

It is well worth your time to read some of these references in detail instead of hacking. If you insist on just one, Aho and Ullman is the classic book.

If you want to see the how on the fly code generation can be made to work reasonably well, check out metacompilers.

2
On

Short answer After matching a rule in your grammar, call a function that does something based on the input.

Long answer: From Language Implementation Patterns by Terence Parr discussing Syntax-Directed Interpreters:

a syntax-directed interpreter doesn't create an AST or translate the source code to bytecodes... The interpreter directly feeds off of syntax to execute statements.

This is precisely the idea that I was going for in my original question. Continuing in the book he describes the technique, how it works, what it consists of, and the limitations. It works for DSL and small languages rather than general purpose languages.

He states that constructs such as IFs and loops are awkward to implement. While this is likely true, I had no trouble implementing such features in a language of mine, Peay BASIC. Languages that are sequences of instructions or simple statements are ideal for this pattern; he gives text-processing languages as an example.

The grammar is extended to consist of "match this, call that" pairs. So the neccessary rules in your grammar should have an action associated with it. Example of an assignment statement:

assignment : ID '=' expr {interp.assign($ID, $expr.value);};
expr returns [Object value] : ... ;