Elastic Search - Query only searches 5 characters

795 Views Asked by At

I have this issue where regardless of what value I send in a query, I don't get any results past the fifth character searched.

Example:

  • {"match": {"name": "benjami"}} - Will return no results
  • {"match": {"name": "benja"}} - Return results with name Benja...
  • {"match": {"name": "benjamin"}} - Return results with name Benjamin

Index:

"name" : { "type": "string", "analyzer": "edge_ngram_analyzer" }

settings:

"analyzer": {
    "edge_ngram_analyzer":{
        "type": "custom", "tokenizer": "standard", "filter": ["lowercase","edge_ngram_filter"]}},
"filter": {
    "edge_ngram_filter":{
        "type": "edge_ngram", "min_gram": 1, "max_gram": 40}}

Using term vectors I have found that the field is indexed correctly. The issue lies somewhere with elastic search not searching my full query value. Does anyone have any idea why this happens? Thank you so much for helping out, I'm using elastic search version 5.6!

Index

"properties" : { "searchid": {"type": "string", "index": "not_analyzed"},
        "otherId": {"type": "string", "analyzer": "edge_ngram_analyzer"},
        "name": {"type": "string", "analyzer": "edge_ngram_analyzer"},
}

Settings

"settings": {
        "number_of_replicas": 0,
        "analysis": {
            "filter": {"edge_ngram_filter": {"type": "edge_ngram", "min_gram": 2, "max_gram": 80}},
            "analyzer": {
                "edge_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "my_tokenizer",
                    "filter": ["lowercase", "edge_ngram_filter"],
                },
                "short_edge_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "edge_ngram_filter"],
                },
                "case_sensitive": {"type": "custom", "tokenizer": "whitespace", "filter": []}
            },
            "tokenizer": {
                "my_tokenizer": {
                  "type": "edge_ngram",
                  "min_gram": 2,
                  "max_gram": 40,
                  "token_chars": [
                    "letter","digit"
                  ]
                }
        },
        },
    },

Query

{'query': 
{'function_score': 
{'query': 
{'bool': {'should': [{'multi_match': {'query': 'A162412350', 'fields': ['otherId']}}}]}}, 
'functions': [{'field_value_factor': {'field': 'positionOrActive', 'modifier': 'none', 'missing': '0', 'factor': '1.1'}}], 'score_mode': 'sum', 'boost_mode': 'sum'}}, 'size': 25}

Doc Results

[{u'otherId': u'A1624903499',
  u'positionOrActive': 0,
  'searchScore': 18.152431,
  u'id': 35631,,
 {u'otherId': u'A1624903783',
  u'positionOrActive': 0,
  'searchScore': 18.152431,
  u'id': 35632,
 {u'otherId': u'A1624904100',
  u'positionOrActive': 0,
  'searchScore': 18.152431,
  u'id': 35633,]

settings

{
  "issuersearch": {
    "settings": {
      "index": {
        "refresh_interval": "1s",
        "number_of_shards": "1",
        "provided_name": "issuersearch",
        "creation_date": "1602687790617",
        "analysis": {
          "filter": {
            "edge_ngram_filter": {
              "type": "edge_ngram",
              "min_gram": "2",
              "max_gram": "80"
            }
          },
          "analyzer": {
            "edge_ngram_analyzer": {
              "filter": Array[2][
                "lowercase",
                "edge_ngram_filter"
              ],
              "type": "custom",
              "tokenizer": "my_tokenizer"
            },
            "short_edge_ngram_analyzer": {
              "filter": Array[2][
                "lowercase",
                "edge_ngram_filter"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "case_sensitive": {
              "type": "custom",
              "tokenizer": "whitespace"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "token_chars": Array[2][
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "edge_ngram",
              "max_gram": "40"
            }
          }
        },
        "number_of_replicas": "0",
        "uuid": "dexqFx32RXy-AC3HHpfElA",
        "version": {
          "created": "5060599"
        }
      }
    }
  }
}
1

There are 1 best solutions below

8
On

It might be happening due to standard tokenizer which split the tokens on whitespace, you need to provide the complete example(full-index mapping, sample docs, and actual results for your search query to confirm it).

Also, hope you are not using any search_analyzer on your name field.