regex to match non-latin char with ASCII 0-31 and 128-255

5.1k Views Asked by At

wanted to match the non-latin char. tried it. as per my understanding if (a.matches("[\\x8A-\\xFF]+")) should return true but its false.

String a = "Ž";
if (a.matches("[\\x8A-\\xFF]+"))
{

}
1

There are 1 best solutions below

9
On

Judging from your title:

Regex to match non-latin char with ASCII 0-31 and 128-255

it seems you're after all characters except those in range 32-127 and you're surprised Ž doesn't match.

If this is correct, I suggest you use the expression [^\x20-\x7F] ("all characters except those in range 32-127"). This does match Ž.

(An exact translation of the regex in your title would look like [\x00-\x1F\x80-\xFF] but this still doesn't match Ž as described below.)

Why your initial attempt didn't work:

The \xNN matches characters unicode values. The unicode value for Ž is 0x017D, i.e. it falls outside of the range \x8A-\xFF.

When you say "Ž" is 8E you're most likely seeing a value from an extended ASCII table, and these are not the values that the Java regex engine works with.