ANTLR grammar : Understand CP1252 euro character

265 Views Asked by At

My grammar is a simple one but I want it to accept some strings in order to make concatenation. This formula has to be valid :

CONCATENATE(10;" €" )

The problem is that euro symbol. I used to but this into my grammar, this was working very good for the degree symbol :

fragment SPECIAL        :   '\u00B0';

But the euro symbol is not working like this degree symbol :

fragment SPECIAL        :   '\u00B0' | '\u20AC'

I'm generating a PHP parser with ANTLR 3.4 and the lexer code generated is the following for the degree symbol :

$this->getToken('176')== $LA26 || ...

And it should ad this for the euro symbol, If I add it manually after parser generation (there are 2 places to add it), it works !

$this->getToken('128')== $LA26 || ...

My question is : How to add it in the grammar to get this code generation ? Is there a problem with this range of unicode symbols, starting with something wlse than u00... because all my other SPECIAL characters are starting with \u00

Thanks a lot for the time spent with me. Sincerely Nicolas.

1

There are 1 best solutions below

2
On

If your parser uses CP1252 input, how do you expect it to work with tokens defined in terms of Unicode code points?

If the input is CP1252, you need to use that charset's code points. Use \u0080 for euro sign in CP1252.