Goal: find a way to formally define a grammar that recognizes elements from a set 0 or 1 times in any order. Subsequently, I want to parse it and generate an AST as well.
For example: Say the set of valid strings in my language is {A, B, C}
. I want to define a grammar that recognizes all valid permutations of any number of those elements.
Syntactically valid strings would include:
- (the empty string)
A
,B A
, andC A B
Syntactically invalid strings would include:
A A
, andB A C B
To be clear, defining all possible permutations explicitly in a CFG is unacceptable for my purposes, since larger sets would be impossible to maintain.
From what I understand, such a language fails the pumping lemma for context free languages, so the solution will not be context free or regular.
Update
What I'm after is called a "permutation language", which Benedek Nagy has done some theoretical work on as an extension to context free languages.
Regarding a parser generator, I've only found talk of implementing parsers with a permutation phase (link). Parsers evidently have an exponential lower bound on the size of resulting CFG, and I haven't found any parser generators that support it anyhow.
A sort-of solution to this problem was written in ANTLR. It uses semantic predicates to 'code around' the issue.
Assuming that the set of alternative strings is fixed and known in advance, say of size n, one can come up with a (non context-free) grammar of size O(n!). This is not asymptotically smaller than enumerating all permutations, so I suppose it cannot be considered a good solution. I believe that this grammar can be reformulated as a context-sensitive grammar (although in the form I'm suggesting below it is not).
For the example
{a, b, c}
mentioned in the question, one such grammar is the following. I'm using lower case letters for terminal symbols and upper case letters for non-terminals, as is customary.S
is the initial non-terminal symbol.Non-terminals
X
andY
enclose the substring in the production which has not been finalized yet; this substring will eventually be replaced by a permutation of the terminals that are given betweenX
andY
(in some arbitrary order).