How to match a single Unicode character in single quotes

74 Views Asked by At

My language has single-quoted Unicode character literals like:

'h'
''

etc.

I'm using the following rule to parse this:

CHAR = "'" (!"'" c:.) "'" { return c; }

This works for ASCII characters, but unfortunately not for Unicode.

How can I modify this to match a single Unicode character like the emoji above?

2

There are 2 best solutions below

0
divs1210 On BEST ANSWER

I solved this by parsing character literals as strings.

Then, in JS, I spread the string into individual unicode codepoints.

If there are more than 1 codepoints, I throw a parse error.

Otherwise, I pick the first codepoint.

0
Yukulélé On

That seems to work:

CHAR = "'" c:$([\u0800-\uffff]?.) "'" { return c; }