How to always recommend different documents (files) in Elasticsearch

Question

How to always recommend different documents (files) in Elasticsearch

193 Views Asked by zoran jeremic At 27 July 2025 at 14:07

I have a service that recommends documents (files) relevant to the user current context. It uses ElasticSearch more_like_this in combination with filters (see query bellow). These documents are uploaded by users and if it is public, then it could be recommended to other users. It works fine, but the problem happens when two or more users upload same files. There are two or more instances of the same document in elasticsearch and it is very likely that both (or even more) files will be recommended.

Does anyone have idea how I could enforce ElasticSearch to ignore these duplicates and return only one instance of the same file?

POST _search
{
 "query": {
   "filtered": {
    "query": {
       "mlt": {
       "fields": [
          "file"
         ],
         "like_text": "Some sample text here",
         "min_term_freq": 1,
         "max_query_terms": 1,
         "min_doc_freq": 1
    }
  },
"filter" : {
  "or" : {
    "filters" : [ {
      "term" : {
        "visibility" : "public"
      }
    }, {
      "and" : {
        "filters" : [ {
          "term" : {
            "visibility" : "private"
          }
        }, {
          "term" : {
            "ownerId" : 2
          }
        } ]
      }
    } ]
  }
 }
 }
 },
"fields": [
  "id","title","visibility", "ownerId","contentType", "dateCreated", "url"]
}

Edited:

I solved the first part of this problem. I'm using Tika to extract the content from web page or text document. Then, I'm using it in More Like This query as like text to find most similar documents, and those having values higher then 0.9 are marked as duplicate. For this, I'm using a new field "uniqueness" which has UUID value. If new document to index is duplicate, I'm copying its "uniqueness" value, and if there is no duplicates, I'm creating new value "uniqueness" for that document.

However, the second part of the problem I still didn't solve is how to make a query that will eliminate these duplicates. So basically in above mentioned query, I have to integrate part that will choose only 1 instance of documents with the same value of field "uniqueness".

Does anybody have an idea how to solve this?

Original Q&A

There are 1 best solutions below

**fatih** · Answer 1

fatih On 13 January 2014 at 10:51

You can define a "duplicate" field where you can set the value to "true" or the id of a duplicate document during indexing. then you can filter out these documents.

How to always recommend different documents (files) in Elasticsearch

There are 1 best solutions below

Related Questions in LUCENE

Related Questions in ELASTICSEARCH

Related Questions in MORELIKETHIS

Trending Questions

Popular # Hahtags

Popular Questions