Elasticsearch should has different scores

864 Views Asked by At

I am retrieving documents by filtering and using a bool query to apply a score. For example:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "color": "Yellow"
          }
        },
        {
          "term": {
            "color": "Red"
          }
        },

        {
          "term": {
            "color": "Blue"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

If data has only "Yellow" it gives me a score of "1.5" but if data has only "Red" it gives me a score of "1.4". And I wanted the score to be the same. Each data has only 1 match so why the scores are different? There is anything to ignore the order of terms in should query? When I have only 1 match, the "Yellow" one will be always with a high score...

UPDATE: The issue is not in order of terms in should array but in "number of documents containing the term"

3

There are 3 best solutions below

2
On BEST ANSWER

You can use the filter clause along with the bool/should clause, if the scoring is not important for you

The filter context avoids the scoring part and is a normal yes/no query. So the score will always be 0.0 for the matched documents

{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "term": {
                "color.keyword": "Yellow"
              }
            },
            {
              "term": {
                "color.keyword": "Black"
              }
            },
            {
              "term": {
                "color.keyword": "Purple"
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
} 

The score of the matched documents depends on several factors like length of the field, frequency of term, the total number of documents, etc.

You can know more about how score is calculated by using explain API

GET /_search?explain=true
0
On

@ESCoder using the example above I have:

"Yellow"

{
                      "value" : 1.5995531,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 30,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

"Red"

{
                      "value" : 1.0375981,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 53,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

Each one (Red and Yellow) only appears once in each document. I want to have the same score if has Red or Yellow. I don't care how many documents each one has. If one document has only Yellow and another has only Red, I would like to have the same score for both. Is it possible?

0
On

Like others mentioned - score depends on numerous factors. However, if you want to ignore all of them, you could use constant_score to assign a consistent score if the document matches a specific term, e.g:

{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Yellow"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Red"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Blue"
              }
            },
            "boost": 1
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

I believe this should fulfill your requirement.