I see that the asciifolding
filter of OpenSearch only handles Latin accents and does not handle Greek at all (note: some accents are not rendered well in this site due to the font used):
POST /_analyze
{
"text": [ "Latin: ấ ê ŏ õ ô ì / Greek: ἆ ᾧ ῦ ἄ ἒ " ],
"filter": [
"asciifolding"
]
}
{
"tokens": [
{
"token": "Latin: a e o o o i / Greek: ἆ ᾧ ῦ ἄ ἒ ",
"start_offset": 0,
"end_offset": 38,
"type": "word",
"position": 0
}
]
}
Is there any other filter that can handle Unicode characters, that I can use to process Greek and remove accents/diacritics, or I will have to roll out my own?
I have found those two alternative ways to achieve my goal, but I was hoping that something built-in would exist for something so basic:
Any hints or ideas are welcome.