How to make edge_ngram token match with certaint quantity ofwords between them?

53 Views Asked by At

I'm trying to make a search request that retrieves the results only when less than 5 words are between requested tokens.

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "stopWords": {
            "type": "stop",
            "stopwords": [
              "_english_"
            ]
          }
        },
        "normalizer": {
          "lowercaseNormalizer": {
            "filter": [
              "lowercase",
              "asciifolding"
            ],
            "type": "custom",
            "char_filter": []
          }
        },
        "analyzer": {
          "autoCompleteAnalyzer": {
            "filter": [
              "lowercase"
            ],
            "type": "custom",
            "tokenizer": "autoCompleteTokenizer"
          },
          "autoCompleteSearchAnalyzer": {
            "type": "custom",
            "tokenizer": "lowercase"
          },
          "charGroupAnalyzer": {
            "filter": [
              "lowercase"
            ],
            "type": "custom",
            "tokenizer": "charGroupTokenizer"
          }
        },
        "tokenizer": {
          "charGroupTokenizer": {
            "type": "char_group",
            "max_token_length": "20",
            "tokenize_on_chars": [
              "whitespace",
              "-",
              "\n"
            ]
          },
          "autoCompleteTokenizer": {
            "token_chars": [
              "letter"
            ],
            "min_gram": "3",
            "type": "edge_ngram",
            "max_gram": "20"
          }
        }
      }
    }
  }
}

The settings:

{
  "mappings": {
    "_doc": {
      "properties": {
        "description": {
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 64
                }
              },
              "analyzer": "autoCompleteAnalyzer",
              "search_analyzer": "autoCompleteSearchAnalyzer"
            },
            "text": {
              "type": "text",
              "analyzer": "charGroupAnalyzer"
            }
          }
        }
      }
    }
  }
}

And make a bool request with request:

{
    "query": {
        "bool": {            
            "must": [
                {
                    "multi_match": {
                        "fields": [
                            "description.name"                            
                        ],
                        "operator": "and",
                        "query": "rounded elephant",
                        "fuzziness": 1
                    }
                },
                {
                    "match_phrase": {
                        "description.text": {
                            "analyzer": "charGroupAnalyzer",
                            "query": "rounded elephant",
                            "slop": 5,
                            "boost": 20
                        }
                    }
                }
            ]
        }
    }
}

I expect the request to retrieve documents, where description contains:

... rounded very interesting elephant ...

This works good, when i use the complete words, like rounded elephant.

But, whe i enter prefixed words, like round eleph it fails.

But it's obvious that the description.name and description.text have different tokenizers (name contains ngram tokens, but text contain word tokens), so i get completely wrong results.

How can I configure mappings and search, to be able to use ngrams with distance between tokens?

0

There are 0 best solutions below