I'm pretty new to nearly.js, and I would like to know what tokenizers/lexers do compared to rules, according to the website:
By default, nearley splits the input into a stream of characters. This is called scannerless parsing. A tokenizer splits the input into a stream of larger units called tokens. This happens in a separate stage before parsing. For example, a tokenizer might convert
512 + 10
into["512", "+", "10"]
: notice how it removed the whitespace, and combined multi-digit numbers into a single number.
Wouldn't that be the same as:
Math -> Number _ "+" _ Number
Number -> [0-9]:+
I don't see what the purpose of lexers are. I see that rules are always useable in this case and there is no need for lexers.
After fiddling around with them, I found out the use of tokenizers, say we had the following:
This won't work, if we try compiling this, we get ambiguous grammar, "if" will be matched as both a keyword and an Identifier, a tokenizer however:
Trying to compile this will not result in ambiguous grammar, because tokenizers are smart (at least the one shown in this example, which is Moo).