Lets say I have the characters Ú, Ù, Ü. All of them are similar glyphically to the English U.
Is there some list or algorithm to do this:
- Given a Ú or Ù or Ü return the English U
- Given a English U, return the list of all U-similar characters
I'm not sure if the code point of the Unicode characters is the same across all fonts? If it is, I suppose there could be some easy way and efficient to do this?
UPDATE
If you're using Ruby, there is a gem available unicode-confusable for this that may help in some cases.
This won't work for all conditions, but one way to get rid of most accents is to convert the characters to their decomposed form, then throw away the combining accents:
Output
To find accent characters, use something like:
Output