I'm trying an example using the same settings as in the documentation when creating an index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"char_filter": [
"emoticons"
],
"tokenizer": "punctuation",
"filter": [
"lowercase",
"english_stop"
]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "[ .,!?]"
}
},
"char_filter": {
"emoticons": {
"type": "mapping",
"mappings": [
":) => _happy_",
":( => _sad_"
]
}
},
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
}
}
}
}
}
then I save a data to the index
POST /my-index-000003/_doc/1
{
"content": "I'm feeling :) today, but the weather is quite gloomy :("
}
However, when I search for :) or happy, I can't find a match. Why?
At indexing time
:)gets replaced with_happy_and:(with_sad_. So you cannot search for:)or:(anymore.If you don't want your emoticons to be replaced, you need to use a synonyms token filter instead of a character filter.
If you search for
happythat will not find_happy_, but if you search for_happy_that will work, I was able to reproduce and that worked with the following query:Note that this will only work if your
contentfield is configured with themy_custom_analyzeranalyzer