elasticsearch php not return search result without space

774 Views Asked by At

I have added 15k records in elasticsearch index products_idx1 and type product.

In records product name like apple iphone 6 so when I search for iphone6 it returns empty data.

Here is my code in php elasticsearch

<?php

    use Elasticsearch\ClientBuilder;

    require 'vendor/autoload.php';

   $client = ClientBuilder::create()->build();
 $values =['name','name.prefix','name.suffix','sku'];
$params =
[
'client'=>['verify'=>1,'connect_timeout'=>5],
'from'=> 0,
'size'=>25,
 'body'  =>[
'query' => [
 'bool'=>
            [
            'should'=> [[
                'multi_match'=> ['query'=>'iphone6','type'=>'cross_fields','fields'=>$values,'operator'=>'OR']
                ],
                ['match'=>['all'=>['query'=>'iphone6','operator'=>'OR','fuzziness'=>'AUTO'] ]]
                ]
            ]

],
'sort'=>['_score'=>['order'=>'desc']],
],

'index'=>'products_idx1'
];

 $response = $client->search($params);
echo "<pre>";print_r($response);
2

There are 2 best solutions below

6
On BEST ANSWER

Using the shingle and pattern_replace token filter it's possible to get the result for all 3 search terms which is mentioned in question and comment aka iphone, iphone6 and appleiphone and below is complete example of it.

As explained in the comment, you search time tokens generated from search term should match the index time tokens generated from indexed doc, in order to get the search result and this is what I've achieved by creating the custom analyzer.

Index mapping

{
  "settings": {
    "analysis": {
      "analyzer": {
        "text_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "shingle",
            "lowercase",
            "space_filter"
          ]
        }
      },
      "filter": {
        "space_filter": {
          "type": "pattern_replace",
          "pattern": " ",
          "replacement": "",
          "preserve_original": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "text_analyzer"
      }
    }
  }
}

Index your sample doc

{
  "title" : "apple iphone 6" 
}

Search query of appleiphone with result

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "appleiphone"
          }
        }
      ]
    }
  }
}

result

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

Search query for iphone6 with result

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone6"
          }
        }
      ]
    }
  }
}

Result

 "hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

And Last but not the least search query for iphone

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone"
          }
        }
      ]
    }
  }
}

Result

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]
0
On

As my answer is already very big, adding the information about the analyze API in another answer for readability reasons and for folks who are not very familiar with analyzers in Elasticsearch and how it works.

In my previous answer's comment as @Niraj mentioned other documents are working but he is having an issue with iphone6 query, so in order to debug the issue anlyze API is very useful.

First check the index time tokens present for your document which you think should match your search query which is in this case, apple iphone 6

PUT http://{{hostname}}:{{port}}/{{index}}/_analyze

{
"text" : "apple iphone 6",
"analyzer" : "text_analyzer"
}

And generated tokens

{
"tokens": [
{
"token": "apple",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "appleiphone",
"start_offset": 0,
"end_offset": 12,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "iphone",
"start_offset": 6,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "iphone6", //note this carefully
"start_offset": 6,
"end_offset": 14,
"type": "shingle",
"position": 1,
"positionLength": 2
},
{
"token": "6",
"start_offset": 13,
"end_offset": 14,
"type": "<NUM>",
"position": 2
}
]
}

Now as you can see the analyzer used by us creates iphone6 also as a token, now check for search time token

{
  "text" : "iphone6",
  "analyzer" : "text_analyzer"
}

And tokens

{
    "tokens": [
        {
            "token": "iphone6",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Now you can notice search tokens also creats iphone6 as a token which is present in index time tokens as well, so that's the reason it will match the search query which I already shown in my complete example given in first answer