Sunspot / Solr / Lucene : Find similar article

2k Views Asked by At

Let's say we have a list of articles that are indexed by sunspot/solr/lucene (or any other search engine).

How can be used to find similar articles with a given article?

Should this be done with a resuming tool, like: http://www.wordsfinder.com/api_Keyword_Extractor.php, or termextract from http://developer.yahoo.com/yql/console, or http://www.alchemyapi.com/api/demo.html ?

2

There are 2 best solutions below

0
On BEST ANSWER

What you are trying to do is very similar to the task I outlined in this answer.

In brief, you need to generate a summary for each document that you can use as the query to compare it with every other. A document summary could be as simple as the top N terms in that document (excluding stop words). You can generate top N terms from a Lucene document pretty easily without using any 3rd party tools, there are plenty examples on SO and the web to do this.

0
On

It seems you're looking for the MoreLikeThis feature.