Elasticsearch - Research that returns too many bad results

Question

Elasticsearch - Research that returns too many bad results

115 Views Asked by arno At 30 April 2021 at 10:41

I have an elasticsearch that works but it is really too large, it gives me too many results on terms that have nothing to do with it. I'm looking for a way to refine these results.

On a sample of fake text when I search for the term music, the terms that come out in highlights are :
must, much, alice, inside, patriotic, noticed

I think that the ngram doesn't help me but I think I really need it to have a better search.

Here is my configuration :

{
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
        "analyzer": {
            "default": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ["lowercase", "mySnowball", "myNgram"]
            },
            "default_search": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": ["standard", "lowercase", "mySnowball", "myNgram"]
            }
        },
        "filter": {
            "mySnowball": {
                "type": "snowball",
                "language": "English"
            },
            "myNgram": {
                "type": "ngram",
                "min_gram": 2,
                "max_gram": 6
            }
        }
    }
}

Here is my request :

    {
    "query": {
        "bool": {
            "should": [{
                "match": {
                    "content": "music"
                }
            }, {
                "match": {
                    "url": "music"
                }
            }, {
                "match": {
                    "h1": "music"
                }
            }, {
                "match": {
                    "h2": "music"
                }
            }
         ],
            "minimum_should_match": 1
        }
    },
    "min_score": 8
}

My document is quite simple :

content => text,
url => text,
h1 => text,
h2 => text,

And the mapping too:

$configMapping  = [
    'content' => ['type' => 'text', 'boost' => 6],
    'url'     => ['type' => 'text', 'boost' => 6],
    'h1'      => ['type' => 'text', 'boost' => 9],
    'h2'      => ['type' => 'text', 'boost' => 7]
]

I welcome any modification that will allow me to obtain only consistent results.

Original Q&A

There are 1 best solutions below

**Shira Elitzur** · Answer 1 · 2021-05-04T06:09:17.807000

As you said yourself, analyzing with 'ngram' is the reason you get all these unrelated results.

In all the results you get, you can see the token (2 characters token, as the minimum of your n-gram) that matched the query term 'music': must, much, alice, inside, patriotic, noticed

Start by removing this filter from your analyzer and keep on tuning the results from there.

Elasticsearch - Research that returns too many bad results

There are 1 best solutions below

Related Questions in PHP

Related Questions in ELASTICSEARCH

Related Questions in ELASTICA

Trending Questions

Popular # Hahtags

Popular Questions