How to include partially-matched results in a cross-fields query?

341 Views Asked by At

I'm building a cross-fields search using Elasticsearch 6.2. I'm having problems in figuring out how to handle partial matches for my term.

My query:

{
   "index":"course",
   "type":"course",
   "body":{
      "query":{
         "bool":{
            "must":{
               "multi_match":{
                  "query":"macroeconomics",
                  "fields":[
                     "course_name",
                     "course_number",
                     "university_name"
                  ],
                  "type":"cross_fields"
               }
            }
         }
      },
      "sort":[
         {
            "_score":"desc"
         },
         {
            "students":{
               "order":"desc"
            }
         }
      ],
      "from":0,
      "size":50
   }
}

The query returns decent results that exactly match the macroeconomics search term in the cross-fields mode.

The problem is that as soon as I change the search term to macro, I get a few results only for the macro term (exact matches), while my expected results would include:

  • any results for the macro term (as an exact match), plus
  • any results for the macro term (as a partial match), like e.g. in "macroeconomics"

I'm aware that using wildcards is performance-heavy, so that's not an optimal way.

How do I adjust my query to get the expected results as described above? It's not about treating "macro" as a prefix only, but as a potential substring available in other results.

1

There are 1 best solutions below

2
On

Basically you will need to create a custom analyzer. For reference please check the link

If you just want to give it a go. To set up the NGram Tokenizer, we should declare as the following:

  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer",
          "filter": [
            "lowercase"
          ]
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  }

"my_analyzer" is the analyzer’s name that we will use for the ngram field Then for your mappings, you need to map the analyzer to the field

 "mappings": {
    "_doc": {
      "properties": {
        "course_name": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
    }
    ...

Just add the analyzer to the fields that you want

UPDATE Validate your analyzer

GET yourindexname/_analyze 
{
  "analyzer": "my_analyzer", 
  "text":     "macroeconomics"
}

The other one I have seen a lot is,

"min_gram" : "3",
"max_gram" : "8"

But it all depends on your use case.