ElasticSearch - Boost only once in a boolean query

54 Views Asked by At

I'm not really sure how to word my question, but take the following object example:

{
  "pricing": [
    {"cost": 5000, "style": "fixed"},
    {"cost_min": 100, "cost_max": 500, "style": "range"},
    {"style": "fixed"}
  ]
}

What I'm trying to do is (pseudo logic):

Boost score by X IF exists(pricing.cost) OR (exists(pricing.cost_min) AND exists(pricing.cost_max))

This what I currently have:

"bool": {
    "should": [
        {
            "exists": {
                "field": "pricing.cost",
                "boost": 2
            }
        },
        {
            "bool": {
                "should": [
                    {
                        "exists": {
                            "field": "pricing.cost_min",
                        }
                    },
                    {
                        "exists": {
                            "field": "pricing.cost_max",
                        }
                    }
                ],
                "minimum_should_match": 2,
            }
        }
    ],
    "minimum_should_match": 1,
    "boost": 1
}

It works, except that for the example object I gave it boosts "twice", giving a score of 4, but really I want score of 2

2

There are 2 best solutions below

0
rabbitbr On

By default each clause applies boost = 1. As you want to have boost 2 and not 4 you would have to set boost 0 in pricing.cost_min and pricing.cost_max.

1
hkulekci On

You can get the score as 2 for this document if you use nested for the price. But this will change your query too. Here full example for your example document :

PUT pricing_with_nested
{
  "mappings": {
    "properties": {
      "pricing":{
        "type": "nested"
      }
    }
  }
}

POST pricing_with_nested/_doc
{
  "pricing": [
    {"cost": 5000, "style": "fixed"},
    {"cost_min": 100, "cost_max": 500, "style": "range"},
    {"style": "fixed"}
  ]
}

GET pricing_with_nested/_search
{
  "explain": true, 
  "query": {
    "nested": {
      "path": "pricing",
      "query": {
        "bool": {
          "should": [
            {
              "exists": {
                "field": "pricing.cost",
                "boost": 2
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "exists": {
                      "field": "pricing.cost_min"
                    }
                  },
                  {
                    "exists": {
                      "field": "pricing.cost_max"
                    }
                  }
                ],
                "minimum_should_match": 2
              }
            }
          ],
          "minimum_should_match": 1,
          "boost": 1
        }
      }
    }
  }
}

Edit :

I want to give some detail to explain why this is giving 4 as score when you use normal data type instead of 2. Let's start with your query. When you use "explain": true for your first query, you will see the following explanation for the result of your document.

"_explanation": {
   "value": 4,
   "description": "ConstantScore(*:*)^4.0",
   "details": []
}

This is because exists query will work with a constant score, and per clause of the should of your main bool, the score will multiplied with the boost. Let's look at the results if we can change the type of the field with nested. The explanation will change as below :

{
  "_explanation": {
    "value": 2,
    "description": "Score based on 2 child docs in range from 0 to 2, best match:",
    "details": [
      {
        "value": 2,
        "description": "sum of:",
        "details": [
          {
            "value": 2,
            "description": "sum of:",
            "details": [
              {
                "value": 2,
                "description": "ConstantScore(FieldExistsQuery [field=pricing.cost])^2.0",
                "details": []
              }
            ]
          },
          {
            "value": 0,
            "description": "match on required clause, product of:",
            "details": [
              {
                "value": 0,
                "description": "# clause",
                "details": []
              },
              {
                "value": 1,
                "description": "_nested_path:pricing",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

As you can see from this text, Score based on 2 child docs in range from 0 to 2, best match:. Each score will be separated for nested object and the end query will select the best score.