Finding and ranking multiple phrase matches in lucene indexed documents

377 Views Asked by At

Given a series of documents containing text, I'd like to search for phrases and return all the matches and rank them. I know how to get lucene/solr to indicate which documents matches, and do highlighting within the document, but how do I get a ranking that includes multiple matches from the same document?

First document.  It has a single line of text.
Second document.  This text line is quite short.
This is another line containing more text and is a bit longer.

If I searched for "text line", then I'd like it to find three matches, ranked as follows:

2nd document -> ...This "text line" is quite short.
1st document -> ...It has a single "line of text".
2nd document -> ...another "line containing more text" and is...

Is this possible? How?


There are 1 best solutions below


If you want to have one match per line, then make each line its own document. Don't let the term "document" be confused with whether the text is actually a single file.

If you want to maintain a link back to the file, just index the id as well in a different (stored) field.

{ id: "myfile.txt",
  text: "first line" }

{ id: "myfile.txt",
  text: "second line" }