I'm completely new to Hibernate Search and I'm facing a bug in which the searching is matching 007a7358924e4a60923c6a57f58333bf when the query term is 0001. The field in question is the following:
@FullTextField(analyzer = "edgeNgram")
@Column(name = "serial")
private String serial;
The edgeNgram is declared as:
@Override
public void configure(final LuceneAnalysisConfigurationContext context) {
context.analyzer("edgeNgram").custom()
.tokenizer(WhitespaceTokenizerFactory.class)
.charFilter(HTMLStripCharFilterFactory.class)
.tokenFilter(ASCIIFoldingFilterFactory.class)
.tokenFilter(LowerCaseFilterFactory.class)
.tokenFilter(SnowballPorterFilterFactory.class)
.tokenFilter(EdgeNGramFilterFactory.class)
.param("minGramSize", "2")
.param("maxGramSize", "32");
}
And the matching is done with:
private SearchPredicate matchField(SearchPredicateFactory f, String field, String search) {
return f.match().field(field).matching(search).toPredicate();
}
I don't know if this bug makes sense, since I suppose this is how this engine works, and the essence of searching is showing you results which are not exact. But this was raised as a bug, and I'm looking for someway to make 0001 or 000 to not match the previous string.
I'm open to include any code that you may find useful. I don't really know how to outline this question in a clearer way.
You should try defining a different analyzer to be applied to your search terms without including the ngram filter:
and then in your entity:
what happens is that the same analysis is applied to your search string "0001", and it is tokenized as
[00, 000, 0001]; since your document value007a7358924e4a60923c6a57f58333bfstarts with00you are getting a match.