Is it possible to append terminals retrieved from a text file to a lexicon in Rascal? This would happen at run time, and I see no obvious way to achieve this. I would rather keep the data separate from the Rascal project. For example, if I had read in a list of countries from a text file, how would I add these to a lexicon (using the lexical
keyword)?
Append text file to lexicon in Rascal
96 Views Asked by rien333 At
2
There are 2 best solutions below
0

A completely different answer would be to dynamically update the grammar and generate a parser from this. This involves working against the internal grammar representation of Rascal like so:
set[str] lexicon = {"aap", "noot", "mies"};
syntax Word = ; // empty definition
typ = #Word;
grammar = typ.definitions;
grammar[sort("Word")] = { prod(sort("Word"), lit(x), {}) | x <- lexicon };
newTyp = type(sort("Word"), grammar);
This newType is a reified grammar + type for the definition of the lexicon, and which can now be used like so:
import ParseTree;
if (type[Word] staticGrammar := newType) {
parse(staticGrammar, "aap");
}
Now having written al this, two things:
- I think this may trigger unknown bugs since we did not test dynamic parser generation, and
- For a lexicon with a reasonable size, this will generate an utterly slow parser since the parser is optimized for keywords in programming languages and not large lexicons.
In the data-dependent version of the Rascal parser this is even easier and faster but we haven't released this yet. For now I'd write a generic rule with a post-parse filter, like so:
Because the
filter
function removed all possible derivations forhello
, the parser eventually returns a parse error onhello
. It does not do so foraap
which is in the lexicon, so hurray. Of course you can make interestingly complex derivations with this kind of filtering. People sometimes write ambiguous grammars and use filters like so to make it unambiguous.Parsing and filtering in this way is in cubic worst-case time in terms of the length of the input, if the filtering function is in amortized constant time. If the grammar is linear, then of course the entire process is also linear.