Here is the situation.
There is a field named "content" in the Lucene index document, and the "content" in each document has two values. For example:
- document1 - content: "gas and oil", "energy"
- document2 - content: "gas", "oil"
When I search "content:(+gas +oil)", both document1 and document2 are returned, that is expected.
The next step, I want to loop each content values for the hits,
- "gas and oil"
- "energy"
- "gas"
- "oil"
I used the highlighter, the purpose is to get "gas and oil" be returned because only this one "gas and oil" hit this query "(+gas +oil)".
But I actually get
- "gas and oil"
- "gas"
- "oil"
It seems that the query does not work on the highlighter, so when I use the query "(+gas +oil)" or the query "(gas oil)" to highlight, there is not much difference.
Did I use highlighter wrong? Is there a way to only get "gas and oil"?
Code example I used
for (final String value : values) {
final QueryScorer scorer = new QueryScorer(query);
final Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(new SimpleFragmenter(2000));
final TokenStream tokenStream = analyzer.tokenStream(field, new StringReader(value));
final CachingTokenFilter filter = new CachingTokenFilter(tokenStream);
final String highlightedText = highlighter.getBestFragment(filter, value);
if (StringUtils.isNotBlank(highlightedText)) {
//TODO
}
}
Thanks in advance
Highlighter is based on term, so the best way to solve this problem is to rebuild the index and organize it in a different way, that is:
Therefore, when searching for "content:(+gas +oil)", only document1 will be hit.