SOLR \ More Like This Feature \ how to do loose search of similar text, and have some freedom degree

545 Views Asked by At

Let's say I have long text and I want to search "Term1 Term2 Term3 Term4"

I'd like to surface similar documents in a relaxed way -

  1. other terms can come between in reasonalbe matter - doc with "Term1 OtherTerm Term2 OtherTerm Term3" is acceptable

  2. not all 4 terms should appear - again in reasonalbe manner (3 terms are ok)

from my expereiment seems like SOLR retrieve only docs with the Exact(!) text you were searching...

I tried add all the params with lower limits: Raw Query Params: mlt=true&mlt.fl=Text&mlt.boost=true&mlt.mindf=1&mlt.mintf=0&mlt.interestingTerms=Text

So - is it possible to have similarity work and not only exact search to work?

1

There are 1 best solutions below

3
On

The mlt parameters only govern how the more like this operation works. MoreLikeThis works in two stages, first it gets a set of results from the query, before any MoreLikeThis functionality comes into play. Then it takes the results of that query, and looks up documents similar to them. It does this, generally, by picking what it judges to be the most relevant and useful search terms from the body of the document, and searching on them. So them mlt parameters have nothing to do with how your initial query is handled. Usually, you want your initial mlt query to get very few results, often a single document.

Sounds like you don't want to be using a phrase query at all, so lose the quotes.

  • "Term1 Term2 Term3 Term4" = Phrase query, find all those terms in order
  • Term1 Term2 Term3 Term4 = Series of separate term queries. Find all or any of the terms anywhere in the field.

See the lucene query parser syntax documentation for more information.