I'm using unidecode
module for replacing utf-8
characters. However, there are some characters, for example greek letters and some symbols like Å
, which I want to preserve. How can I achieve this?
For example,
from unidecode import unidecode
test_str = 'α, Å ©'
unidecode(test_str)
gives the output a, A (c)
, while what I want is α, Å (c)
.
Run unidecode on each character individually. Have a whitelist set of characters that you use to bypass the unidecode.