elasticsarch synonym filter with English analyzer

848 Views Asked by At

I would like to get an analyzer with the behavior of the standard english analyzer and also a set of words which should be synonyms during search.

This is the definition which I tried:

{
  "analysis": {
    "filter": {
      "synonym_en": {
        "type": "synonym",
        "synonyms": [
          "universe, cosmos",
          "women, woman",
          "man, men"
        ]
      },
      "my_filter": {
        "type": "word_delimiter",
        "preserve_original": "false",
        "split_on_numerics": "false"
      }
    },
    "analyzer": {
      "my_analyzer": {
        "type": "custom",
        "filter": [
          "my_filter"
        ],
        "tokenizer": "keyword"
      },
      "my_english": {
        "type": "english",
        "stopwords": [
          "a",
          "an",
          "and",
          "are",
          "as",
          "at",
          "be",
          "but",
          "by",
          "for",
          "if",
          "into",
          "is",
          "it",
          "of",
          "on",
          "or",
          "such",
          "that",
          "the",
          "their",
          "then",
          "there",
          "these",
          "they",
          "this",
          "to",
          "was",
          "will",
          "with"
        ],
        "filter": [
          "synonym_en"
        ]
      }
    }
  }
}

However I could not get it tow work. indeed when I run the example:

GET /my_index/_analyze?analyzer=my_english&text='Men'

It only returns the token men, while I would like to have both man and men.

Please also note that a simpler analyzer

{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym", 
          "synonyms": [ 
            "british,english",
            "queen,monarch",
            "man,men"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter" 
          ]
        }
      }
    }
  }
}

Seems to work as it returns man and men.

How can I can the desired behavior + stemming from the English analyzer ?

1

There are 1 best solutions below

0
On

This is because synonym/filter is not an applicable parameter to configure for "english" analyzer. There is a difference between a custom analyzer and a builtin analyzer. Builtin analyzers only allow certain parameters to be configurable . i.e in case of language analyzers it is stopwords ,stem exclusion .So the rest of the parameters in my_english alias for english analyzers are just ignored .Probably the more appropriate behaviour here would be to throw an error.

Custom analyzers on the other hand for a given tokenizer you can add additional token filters and char filters

Anyways if you want to use synonym filter with english analyzer you need to create a custom analyzer that implements an english analyzer as specified here. You can add the synonym filter to this.