Lucene highlighter hits all terms

144 Views Asked by At

Here is the situation.

There is a field named "content" in the Lucene index document, and the "content" in each document has two values. For example:

  • document1 - content: "gas and oil", "energy"
  • document2 - content: "gas", "oil"

When I search "content:(+gas +oil)", both document1 and document2 are returned, that is expected.

The next step, I want to loop each content values for the hits,

  • "gas and oil"
  • "energy"
  • "gas"
  • "oil"

I used the highlighter, the purpose is to get "gas and oil" be returned because only this one "gas and oil" hit this query "(+gas +oil)".

But I actually get

  • "gas and oil"
  • "gas"
  • "oil"

It seems that the query does not work on the highlighter, so when I use the query "(+gas +oil)" or the query "(gas oil)" to highlight, there is not much difference.

Did I use highlighter wrong? Is there a way to only get "gas and oil"?

Code example I used

for (final String value : values) {
    final QueryScorer scorer = new QueryScorer(query);
    final Highlighter highlighter = new Highlighter(scorer);
    highlighter.setTextFragmenter(new SimpleFragmenter(2000));
    final TokenStream tokenStream = analyzer.tokenStream(field, new StringReader(value));
    final CachingTokenFilter filter = new CachingTokenFilter(tokenStream);
    final String highlightedText = highlighter.getBestFragment(filter, value);
    if (StringUtils.isNotBlank(highlightedText)) {
      //TODO
    }
}

Thanks in advance

1

There are 1 best solutions below

0
G L On

Highlighter is based on term, so the best way to solve this problem is to rebuild the index and organize it in a different way, that is:

document1 - content: "gas and oil"
document2 - content: "energy"
document3 - content: "gas"
document4 - content: "oil"

Therefore, when searching for "content:(+gas +oil)", only document1 will be hit.