Elasticsearch analyzer settings and matching data

41 Views Asked by At

I'm trying an example using the same settings as in the documentation when creating an index

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { 
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": { 
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": { 
          "type": "mapping",
          "mappings": [
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": { 
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}

then I save a data to the index

POST /my-index-000003/_doc/1
{
  "content": "I'm feeling :) today, but the weather is quite gloomy :("
}

However, when I search for :) or happy, I can't find a match. Why?

1

There are 1 best solutions below

3
Val On BEST ANSWER

At indexing time :) gets replaced with _happy_ and :( with _sad_. So you cannot search for :) or :( anymore.

If you don't want your emoticons to be replaced, you need to use a synonyms token filter instead of a character filter.

If you search for happy that will not find _happy_, but if you search for _happy_ that will work, I was able to reproduce and that worked with the following query:

POST test/_search
{
  "query": {
    "match": {
      "content": "_happy_"
    }
  }
}

Note that this will only work if your content field is configured with the my_custom_analyzer analyzer

  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_custom_analyzer"
      }
    }
  }