Alternatives to `asciifolding` filter for removing Greek ascents from unicode text

54 Views Asked by At

I see that the asciifolding filter of OpenSearch only handles Latin accents and does not handle Greek at all (note: some accents are not rendered well in this site due to the font used):

POST /_analyze
{
  "text": [ "Latin: ấ ê ŏ õ ô ì / Greek: ἆ ᾧ ῦ ἄ ἒ " ],
  "filter": [
    "asciifolding"
    ]
}
{
  "tokens": [
    {
      "token": "Latin: a e o o o i / Greek: ἆ ᾧ ῦ ἄ ἒ ",
      "start_offset": 0,
      "end_offset": 38,
      "type": "word",
      "position": 0
    }
  ]
}

Is there any other filter that can handle Unicode characters, that I can use to process Greek and remove accents/diacritics, or I will have to roll out my own?

I have found those two alternative ways to achieve my goal, but I was hoping that something built-in would exist for something so basic:

  1. Exclude some characters from asciifolding conversion
  2. Elasticsearch modify asciifolding

Any hints or ideas are welcome.

0

There are 0 best solutions below