Elasticsearch - unexpected character when adding Cyrillic translation

69 Views Asked by At

I have a field in my index that uses the Ukrainian analyzer:

         name_uk: {
            type: 'keyword',
            fields: {
               raw: {
                  type: 'text',
                  analyzer: 'ukrainian'
               }
            }
         }

though I'm not sure whether this issue has anything to do with the analyzer, as I'm simply trying to bulk update at this point.

     await this.elasticClient.updateByQuery({
         index: 'my-index,
         refresh: true,
         query: {
            match: {
               name_en: translation.original
            }
         },
         script: {
            lang: 'painless',
            source: `ctx._source["name_uk"] = "${translation.translated}"`
         }
      });

This works 99% of the time but sometimes it simply wont accept the value I give.

Error committing bulk translations ResponseError: script_exception
    Caused by:
        illegal_argument_exception: unexpected character [А].
    Root causes:
        script_exception: compile error

The character "A" that it's complaining about is the "A" in the translation below:

enter image description here

I copy/paste it directly from Google and as mentioned there is no problem 99% of the time.

This also happened earlier with some sort of "H" character (which also has a Cyrillic version), though I don't recall what the full text was.

What's the issue here?

If I simply re-type the A character with Ukrainian keyword enabled, it works. Likewise it works with an English A character. What is this special A character that Google Translate is giving which blows up only sometimes?

Can ES handle this somehow?

0

There are 0 best solutions below