Elastic Search Relevance for query based on most matches

453 Views Asked by At

I have a following mapping

posts":{
"properties":{
  "prop1": {
    "type": "nested",
    "properties": {
         "item1": {
            "type": "string",
            "index": "not_analyzed"
         },
         "item2": {
            "type": "string",
            "index": "not_analyzed"
         },
         "item3": {
            "type": "string",
            "index": "not_analyzed"
         }
      }
  },
  "name": {
    "type": "string",
    "index": "not_analyzed"
  }
 }
}

Consider the objects indexed like following for these mapping

{
"name": "Name1",
"prop1": [
    {
        "item1": "val1",
        "item2": "val2",
        "item3": "val3"           
    },
    {
        "item1": "val1",
        "item2": "val5",
        "item3": "val6"          
    }
  ]
}

And another object

{
"name": "Name2",
"prop1": [
    {
        "item1": "val2",
        "item2": "val7",
        "item3": "val8"           
    },
    {
        "item1": "val12",
        "item2": "val9",
        "item3": "val10"          
    }
  ]
}

Now say i want to search documents which have prop1.item1 value to be either "val1" or "val2". I also want the result to be sorted in such a way that the document with both val1 and val2 would have more score than the one with only one of "val1" or "val2".

I have tried the following query but that doesnt seem to score based on number of matches

{
"query": {
   "filtered": {
    "query": {"match_all": {}},
    "filter": {
      "nested": {
        "path": "prop1",
          "filter": {
            "or": [
              {
                "and": [
                  {"term": {"prop1.item1": "val1"}},
                  {"term": {"prop1.item2": "val2"}}
                ]
              },
              {
                "and": [
                  {"term": {"prop1.item1": "val1"}},
                  {"term": {"prop1.item2": "val5"}}
                ]
              },
              {
                "and": [
                  {"term": {"prop1.item1": "val12"}},
                  {"term": {"prop1.item2": "val9"}}
                ]
              }
            ]
         }
        }
      }
    }
  }
}

Now although it should give both documents, first document should have more score as it contains 2 of the things in the filter whereas second contains only one. Can someone help with the right query to get results sorted based on most matches ?

2

There are 2 best solutions below

4
On BEST ANSWER

Scores aren't calculated on filters use a nested query instead:

{
    "query": {
        "nested": {
            "score_mode": "sum",
            "path": "prop1",
            "query": {
                "bool": {
                    "should": [{
                        "bool": {
                            "must": [{
                                "match": {
                                    "prop1.item1": "val1"
                                }
                            },
                            {
                               "match": {
                                   "prop1.item2": "val2"
                               }
                           }]
                       }
                   },
                   {
                       "bool": {
                           "must": [{
                               "match": {
                                   "prop1.item1": "val1"
                               }
                           },
                           {
                               "match": {
                                   "prop1.item2": "val5"
                               }
                          }]
                      }
                  },
                  {
                      "bool": {
                          "must": [{
                              "match": {
                                  "prop1.item1": "val12"
                               }
                           },
                           {
                               "match": {
                                   "prop1.item2": "val9"
                               }
                           }]
                       }
                   }]
               }
           }
       }
   }
}
0
On

The biggest problem you have with your query is that you are using a filter. Therefore no score is calculated. Than you use a match_all query which gives all documents a score of 1. Replace the filtered query with a query and use the bool query instead of the bool filter.

Hope that helps.