I'm trying to match all high ASCII and special utf-8 characters using powershell:
gc $file -readcount 0 | select-string -allmatches -pattern "[\x80-\uffff]"
which should find all the characters I want. However, the regular expression seems to be failing as it's matching the character "i" and "I".
I ran this to test and I'm baffled:
"abcdefghijklmnopqrstuvwxyz" | select-string -allmatches -pattern "[\x80-\uffff]"
Why is it matching "i"? What I also don't get is if you cast the character to an int, the value is 105 which is clearly not within the range specified.
The reason is that
iis matched onU+0130(İ, "Latin Capital Letter I with dot above"), a variant of capitalIfound in Turkish:Try with an inverted pattern:
Here is how I found out: