I have a query to search for records in the following format: TR000002_1_2020
.
Users should be able to search for results the following ways:
TR000002
or 2_1_2020
or TR000002_1_2020
or 2020
. I am using Elasticsearch 6.8 so I cannot use the built in Search-As-You-Type introduced in E7. Thus, I figured either wildcard
searches or ngram
may best suit what I needed. Here were my two approaches and why they did not work.
- Wildcard
Property mapping:
.Text(t => t
.Name(tr => tr.TestRecordId)
)
Query:
m => m.Wildcard(w => w
.Field(tr => tr.TestRecordId)
.Value($"*{form.TestRecordId}*")
),
This works but it is case-sensitive so if the user searches with tr000002_1_2020
, then no results would return (since the t
and r
are lowercased in the query)
- ngram (search as you type equivalent)
Create a custom ngram analyzer
.Analysis(a => a
.Analyzers(aa => aa
.Custom("autocomplete", ca => ca
.Tokenizer("autocomplete")
.Filters(new string[] {
"lowercase"
})
)
.Custom("autocomplete_search", ca => ca
.Tokenizer("lowercase")
)
)
.Tokenizers(t => t
.NGram("autocomplete", e => e
.MinGram(2)
.MaxGram(16)
.TokenChars(new TokenChar[] {
TokenChar.Letter,
TokenChar.Digit,
TokenChar.Punctuation,
TokenChar.Symbol
})
)
)
)
Property Mapping
.Text(t => t
.Name(tr => tr.TestRecordId)
.Analyzer("autocomplete")
.SearchAnalyzer("autocomplete_search")
)
Query
m => m.Match(m => m
.Query(form.TestRecordId)
),
As described in this answer, this does not work since the tokenizer splits the characters up in to elements like 20
and 02
and 2020
, so as a result my queries returned all documents in my index that contained 2020 such as TR000002_1_2020
and TR000008_1_2020
and TR000003_6_2020
.
What's the best utilization of Elasticsearch to allow my desired search behavior? I've seen query string
used as well. Thanks!
here is a simple way to address your requirements ( I hope ).
with this analysis chain for reference
TR000002_1_2020
we get the tokens["2", "1", "2020" ]
. So it will matches the queries["TR000002_1_2020", "TR000002 1 2020", "2_1_2020", "1_2020"]
, but it will not match3_1_2020
or2_2_2020
.Here is an example of mapping and analysis. It's not in Nest but I think you will be able to make the translation.