I have this issue where regardless of what value I send in a query, I don't get any results past the fifth character searched.
Example:
- {"match": {"name": "benjami"}} - Will return no results
- {"match": {"name": "benja"}} - Return results with name Benja...
- {"match": {"name": "benjamin"}} - Return results with name Benjamin
Index:
"name" : { "type": "string", "analyzer": "edge_ngram_analyzer" }
settings:
"analyzer": {
"edge_ngram_analyzer":{
"type": "custom", "tokenizer": "standard", "filter": ["lowercase","edge_ngram_filter"]}},
"filter": {
"edge_ngram_filter":{
"type": "edge_ngram", "min_gram": 1, "max_gram": 40}}
Using term vectors I have found that the field is indexed correctly. The issue lies somewhere with elastic search not searching my full query value. Does anyone have any idea why this happens? Thank you so much for helping out, I'm using elastic search version 5.6!
Index
"properties" : { "searchid": {"type": "string", "index": "not_analyzed"},
"otherId": {"type": "string", "analyzer": "edge_ngram_analyzer"},
"name": {"type": "string", "analyzer": "edge_ngram_analyzer"},
}
Settings
"settings": {
"number_of_replicas": 0,
"analysis": {
"filter": {"edge_ngram_filter": {"type": "edge_ngram", "min_gram": 2, "max_gram": 80}},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "my_tokenizer",
"filter": ["lowercase", "edge_ngram_filter"],
},
"short_edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "edge_ngram_filter"],
},
"case_sensitive": {"type": "custom", "tokenizer": "whitespace", "filter": []}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 40,
"token_chars": [
"letter","digit"
]
}
},
},
},
Query
{'query':
{'function_score':
{'query':
{'bool': {'should': [{'multi_match': {'query': 'A162412350', 'fields': ['otherId']}}}]}},
'functions': [{'field_value_factor': {'field': 'positionOrActive', 'modifier': 'none', 'missing': '0', 'factor': '1.1'}}], 'score_mode': 'sum', 'boost_mode': 'sum'}}, 'size': 25}
Doc Results
[{u'otherId': u'A1624903499',
u'positionOrActive': 0,
'searchScore': 18.152431,
u'id': 35631,,
{u'otherId': u'A1624903783',
u'positionOrActive': 0,
'searchScore': 18.152431,
u'id': 35632,
{u'otherId': u'A1624904100',
u'positionOrActive': 0,
'searchScore': 18.152431,
u'id': 35633,]
settings
{
"issuersearch": {
"settings": {
"index": {
"refresh_interval": "1s",
"number_of_shards": "1",
"provided_name": "issuersearch",
"creation_date": "1602687790617",
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": "2",
"max_gram": "80"
}
},
"analyzer": {
"edge_ngram_analyzer": {
"filter": Array[2][
"lowercase",
"edge_ngram_filter"
],
"type": "custom",
"tokenizer": "my_tokenizer"
},
"short_edge_ngram_analyzer": {
"filter": Array[2][
"lowercase",
"edge_ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
},
"case_sensitive": {
"type": "custom",
"tokenizer": "whitespace"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": Array[2][
"letter",
"digit"
],
"min_gram": "2",
"type": "edge_ngram",
"max_gram": "40"
}
}
},
"number_of_replicas": "0",
"uuid": "dexqFx32RXy-AC3HHpfElA",
"version": {
"created": "5060599"
}
}
}
}
}
It might be happening due to
standard
tokenizer which split the tokens on whitespace, you need to provide the complete example(full-index mapping, sample docs, and actual results for your search query to confirm it).Also, hope you are not using any search_analyzer on your
name
field.