PEG grammar to parse identifier name which is not a keyword

320 Views Asked by At

I am using Pest.rs for parsing. I need to parse identifiers but reject them if they happen to be a reserved keyword. For example, bat is a valid identifier name but this is not since that has a specific meaning. My simplified grammar is as below.

keyword = {"this" | "function"}
identifier = {ASCII+}
valid_identifier = { !keyword ~ identifier }

This works but it also rejects identifier names like thisBat. So basically it checks if that the prefix is not a keyword, but I want to check against the full identifier.

2

There are 2 best solutions below

0
On

Supposing that identifiers are composed of alphanumeric characters, another option is:

keyword = {"this" | "function"}
identifier = @{ !(keyword ~ !ASCII_ALPHANUMERIC) ~ ASCII_ALPHANUMERIC+ }

!(keyword ~ !ASCII_ALPHANUMERIC) rejects any identifier that starts with a keyword, as long as the character following the keyword can't be part of the identifier itself. This means that thisBat is an acceptable identifier, but this is not.

0
On

Figured out a hack to address this.

keyword = {"this" | "function"}
identifier = {ASCII+}
valid_identifier = @{ !keyword ~ identifier | keyword ~ identifier }

The new second rule in valid_identifier takes care of matching with the valid case which the first one rejects. Note I have made valid_identifier atomic so that whitespaces are not inserted and the parse output is not like this and Bat, but a single thisBat.