Tiny DSL implementation in Python

2.8k Views Asked by At

I am researching to implement a DSL in Python and i am looking for a small DSL language that is friendly for someone who had no experience with designing and implementing languages. So far, i reviewed two implementations which are Hy and Mochi. Hy is actually a dialect of lisp and Mochi seems to be very similar to Elixir. Both are complex for me, right now as my aim is to prototype the language and play around with in in order to find if it really helps in solving the problem and fits to the style that problem requires or not. I am aware that Python has good support via language tools provided in Standard library. So far i implemented a dialect of lisp which is very simple indeed, i did not used python AST whatsoever and it is purely implemented via string processing which is absolutely not flexible for what i am looking for.

Are there any implementations rather than two languages mentioned above, small enough to be studied ?

What are some good books ( practical in a sense that does not only sticks to theoritical and academic aspect ) on this subject ?

What would be a good way into studying Python AST and using it ?

Are there any significant performance issues related to languages built upon Python ( like Hy ) in terms of being overhead on the actual produced bytecode ?

Thanks

2

There are 2 best solutions below

4
On BEST ANSWER

You can split the task of creating a (yet another!) new language in at least two big steps:

  • Syntax
  • Semantics & Interpretation

Syntax

You need to define a grammar for your language, with production rules that specify how to create complex expressions from simple ones.

Example: syntax for LISP:

expression ::= atom   | list
atom       ::= number | symbol    
number     ::= [+-]?['0'-'9']+
symbol     ::= ['A'-'Z''a'-'z'].*
list       ::= '(' expression* ')'

How to read it: an expression is either an atom or a list; an atom is a number or a symbol; a number is... and so on.

Often you will define also some tokenization rules, because most grammars work at token level, and not at characters level.

Once you defined your grammar, you want a parser that, given a sentence (a program) is able to build the derivation tree, or the abstract syntax tree.

For example, for the expression x=f(y+1)+2, you want to obtain the tree:

abstract syntax tree for <code>x=f(y+1)+2</code>

There are several parsers (LL, LR, recursive descent, ...). You don't necessarily need to write your language parser by yourself, as there are tools that generate the parser from the grammar specification (LEX & YACC, Flex & Bison, JavaCC, ANTLR; also check this list of parsers available for Python).

If you want to skip the step of designing a new grammar, you may want to start from a simple one, like the grammar of LISP. There is even a LISP parser written in Python in the Pyperplan project. They use it for parsing PDDL, which is a domain specific language for planning that is based on LISP.

Useful readings:

Semantics & Interpretation

Once you have the abstract syntax tree of your program, you want to execute your program. There are several formalisms for specifying the "rules" to execute (pieces of) programs:

  • Operational semantics: a very popular one. It is classified in two categories:
    • Small Step Semantics: describe individual steps of computation
    • Big Step Semantics: describe the overall results of computation
  • Reduction semantics: a formalism based on lambda calculus
  • Transition semantics: if you look at your interpreter like a transition system, you can specify its semantics using transition semantics. This is especially useful for programs that do not terminate (i.e. run continuously), like controllers.

Useful readings:

4
On

You don't really need to know a lot about parsing to write your own language.

I wrote a library that lets you do just that very easily: https://github.com/erezsh/lark

Here's a blog post by me explaining how to use it to write your own language: http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/

I hope you don't mind my shameless plug, but it seems very relevant to your question.