Finding and ranking multiple phrase matches in lucene indexed documents

377 Views Asked by At

Given a series of documents containing text, I'd like to search for phrases and return all the matches and rank them. I know how to get lucene/solr to indicate which documents matches, and do highlighting within the document, but how do I get a ranking that includes multiple matches from the same document?

First document.  It has a single line of text.
Second document.  This text line is quite short.
This is another line containing more text and is a bit longer.

If I searched for "text line", then I'd like it to find three matches, ranked as follows:

2nd document -> ...This "text line" is quite short.
1st document -> ...It has a single "line of text".
2nd document -> ...another "line containing more text" and is...

Is this possible? How?

1

There are 1 best solutions below

2
On

If you want to have one match per line, then make each line its own document. Don't let the term "document" be confused with whether the text is actually a single file.

If you want to maintain a link back to the file, just index the id as well in a different (stored) field.

{ id: "myfile.txt",
  text: "first line" }

{ id: "myfile.txt",
  text: "second line" }