I am trying to get highlighting right using Apache Solr. In case of partial match, I want to highlight matching part of the word. However whole word (which partially matched a search term) is highlighted instead.
Example:
Search for "adida shi", which should yield two items, one with name 'adidas shirts' and other 'adidas red shirts'
/select?q=name:adida+shi&hl=true&hl.fl=name&qt=standardwt=json
Expected highlighting:
<em>adida</em>s <em>shi</em>rts
<em>adida</em>s red <em>shi</em>rts
Actual highlighting:
<em>adidas</em> <em>shirts</em>
<em>adidas</em> red <em>shirts</em>
The field that is used for highlighting is defined like this in schema.xml:
<field name="name" type="autocomplete_text" indexed="true" stored="true"/>
The field type for the field looks like this:
<fieldType name="autocomplete_text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
I don't have a specific configuration for highlighting in core config file.
I am using Solr v6.0.1. The highlighting was working as expected with solr v4.10.4 with the same configuration. I went through following sections of Solr wiki and tried various highlighting parameters but I couldn't make it work:
https://cwiki.apache.org/confluence/display/solr/Highlighting https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter
Any ideas how to make it work?
Adding the answer as a follow up for the previous comments.
The issue is most likely caused by EdgeNGramFilterFactory that is not working as expected and reports instead incorrect offsets when generating tokens. Such issue has been reopened in Jira several times in the past few versions of Solr.
I solved it in production setting luceneMatchVersion="4.5" (or whatever version was working for you, for NGramFilterFactory.
I've got this solution within a Jira comment but I can't find it back so I apologize but I am not able to add it as reference.