What characters are usable in a variable name in ObjectScript on a "Unicode" installation?

143 Views Asked by At

I have a parser (in Java) for ObjectScript which works quite well, except for one thing: I don't parse "Unicode variable names".

The problem is that the documentation is not very explanative on this subject; and what is more, it misdefines Unicode as "16 bits". This tells me that only characters within the BMP are allowed.

But which ones? The number of Unicode blocks defined in the JDK is frighteningly high, and scripts aren't any better.

I could maybe use Character.isLetter() (note, I elected the version with a char, not an int), but I'm sure that even that would be too large...

2

There are 2 best solutions below

4
On BEST ANSWER

Eduard was pretty much correct, i.e. local variable could be starting from percent or "alphabetic" character, followed by "alphabetic" characters or digits.

[\p{Alphabetic}%][\p{Alphabetic}\d]*

The most important to note here - what is the "alphabetic"? This implies latin letter or alphabetic in current Caché locale. I.e. with Russian/Unicode locale installed you could write something like:

set порусски = 1

or within Japanese locale:

USER>set a=$c(12354)

USER>set @a=88

USER>write

a="あ"
あ=88
1
On

See docs here and here. To sum up:

A local variable name must be a valid identifier. Its first character must be either a letter or the percent (%) character. Variable names starting with the “%” character are known as “percent variables” and have different scoping rules.