Is a Chomsky type 1 parser generator possible?

814 Views Asked by At

Looking at the Chomsky hierarchy of grammars I find that for Type 2 grammars (context free grammars) very good tools exist to aid in creating software to read these from text into a usable data structure in memory. In my experience Antlr4 is one of these tools that do this.

Recently I was confronted with Markdown which turns out to be a context sensitive grammar (i.e. Chomsky Type 1) and as such no grammar definition for Antlr4 exists. See here and here.

So I was wondering;

  • Is there a tool that aids in creating a parser for a context sensitive grammar?
  • If not, is such a thing even possible?
2

There are 2 best solutions below

1
On

I've actually encountered one. Quinn Taylor Jackson's meta-S seems to do what OP is looking for. This wikipedia articles discusses it and a number of similar systems but meta-S seems more practical than most of the other discussed there: https://en.wikipedia.org/wiki/Adaptive_grammar#cite_note-Jackson2006-3

At quora, I analyze how this stands up in practice against GLR parsers: https://www.quora.com/What-is-the-most-powerful-parser-algorithm

0
On

Most, if not all, real-world programming languages are context-sensitive. A typical parser accepts some constructs which will later be rejected as violating some rule or constraint, and/or rely on some kind of context-sensitive preprocessing to produce a parseable token stream. Macro languages like the C preprocessor obviously fall into the latter category, but at a much more controllable scale, so does the Python scanner which inserts INDENT and DEDENT tokens. Examples of the first include the common requirement that variables be declared before use or that function calls have the same number of arguments as there are parameters in the prototype. Static type analysis also falls into this category.

So all practical parsers deviate to some extent from the idealised model shown in textbooks on formal language theory. And practical parser generators like bison and antlr provide customisable hooks which permit idiosyncratic implementation of these deviations.

In short, parser generators can be useful, and are used, for languages which are "technically" context-sensitive (and I use scare quotes because context-sensitivity is a binary attribute; either it is or it isn't).

It would be great to avoid idiosyncratic hacks, but the problem seems intractable. Since a context-sensitive language can model a Turing machine, a deterministic parse would have to be able the solve the Halting Problem. On the other hand, there are deterministic parsing algorithms for certain subsets of the set of context-sensitive languages, but they don't seem to provide for all the context-sensitivity which is present in practical languages. We still lack the sweet spot between too powerful to parse and too weak to replace the hacks.

This is still a fertile research field, and perhaps you will have something to contribute. But if you are looking for something off-the-shelf, I fear you will be disappointed.