Elastic Search input analysis

145 Views Asked by At

Can Elastic Search split input string into categorized words? i.e. if the input is

4star wi-fi 99$

and we are searching hotels with ES, is it possible to analyze/tokenize this string as 4star - hotel level, wi-fi - hotel amenities, 99$ - price?

yep, it's a noob question :)

1

There are 1 best solutions below

1
On BEST ANSWER

Yes and no.

By default, query_string searches will work against the automatically created _all field. The contents of the _all field come from literally and naively combining all fields into a single analyzed string.

As such, if you have a "4star" rating, a "wi-fi" amenity, and a "99$" price, then all of those values would be inside of the _all field and you should get relevant hits against it. For example:

{
  "level" : "4star",
  "amenity" : ["pool", "wi-fi"],
  "price" : 99.99
}

The problem is that you will not--without client-side effort--know what field(s) matched when searching against _all. It won't tell you the breakdown of where each value came from, rather it will simply report a score that determines the overall relevance.

If you have some way of knowing which field each term (or terms) is meant to search against, then you can easily do this yourself (quotes aren't required, but they're good to have to avoid mistakes with spaces). This would be the input that you might provide to the query_string query linked above:

level:"4star" amenity:"wi-fi" price:(* TO 100)

You could further complicate this by using a spelled out query:

{
  "query" : {
    "bool" : {
      "must" : [
        { "match" : { "level" : "4star" } },
        { "match" : { "amentiy" : "wi-fi" } },
        {
          "range" : {
            "price" : {
              "lt" : 100
            }
          }
        }
      ]
    }
  }
}

Naturally the last two requests would require advanced knowledge about what each search term referenced. You could certainly use the $ in "99$" as a tipoff for price, but not for the others. Chances are you wouldn't have them typing in 4 stars I hope, rather having some checkboxes or other form-based selections, so this should be quite realistic.

Technically, you could create a custom analyzer that recognized each term based on their position, but that's not really a good or useful idea.