Nearley Tokenizers vs Rules

459 Views Asked by kepe At 17 August 2025 at 13:00

I'm pretty new to nearly.js, and I would like to know what tokenizers/lexers do compared to rules, according to the website:

By default, nearley splits the input into a stream of characters. This is called scannerless parsing. A tokenizer splits the input into a stream of larger units called tokens. This happens in a separate stage before parsing. For example, a tokenizer might convert 512 + 10 into ["512", "+", "10"]: notice how it removed the whitespace, and combined multi-digit numbers into a single number.

Wouldn't that be the same as:

Math -> Number _ "+" _ Number
Number -> [0-9]:+

I don't see what the purpose of lexers are. I see that rules are always useable in this case and there is no need for lexers.

Original Q&A

There are 1 best solutions below

kepe On 28 August 2018 at 11:42

After fiddling around with them, I found out the use of tokenizers, say we had the following:

Keyword -> "if"|"else"
Identifier -> [a-zA-Z_]+

This won't work, if we try compiling this, we get ambiguous grammar, "if" will be matched as both a keyword and an Identifier, a tokenizer however:

{
"keyword": /if|else/
"identifier": /[a-zA-Z_]+/
}

Trying to compile this will not result in ambiguous grammar, because tokenizers are smart (at least the one shown in this example, which is Moo).

Nearley Tokenizers vs Rules

There are 1 best solutions below

Related Questions in DEFINITION

Related Questions in NEARLEY

Trending Questions

Popular # Hahtags

Popular Questions