ElasticSearch performance considerations while mapping string fields as both text and keyword?

1.7k Views Asked by At

I have a question regarding the tradeoffs/performance considerations to keep in mind while mapping string fields as both text and keyword vs just one of those.

I have a use-case where mapping around 25-30 string fields as both text and keyword would be a nice to have but if there were some serious performance considerations, then I would drill down and map each of them only to the type they will be searched most as.

I have not been able to find much information online about this. Hence asking here.

ElasticSearch Version 7.10 Thanks!

2

There are 2 best solutions below

4
On BEST ANSWER

The default mappings provided by ES which map a field as both text and keyword usually do that because it's convenient and that will allow the field to be used in different contexts without having to think too hard about it. It's also a good way of bootstrapping new projects and not worry too much about that aspect until later in the project.

However, if you're truly serious about your mappings and the performance of your cluster, you should always give as much thought as possible as to why you map a field in certain way.

There are a few basic rules (but your mileage may always vary) in the following (non-exhaustive) list:

  • IDs, codes, keys, etc, that you usually use in exact searches can be mapped as keyword only (and/or wildcard depending on your search use cases).
  • If you have longer pieces of text closer to natural language that you might want to run full-text searches on, it's usually a good idea to map them as text.
  • The corollary to the previous rule is that if you know that you'll never want to run full text searches on some field, don't map it as text as there is a non-negligible overhead related to indexing text fields during the analysis process.
  • ...

As said, obviously the above list is non-exhaustive, but it gives you some pointers. The bottom line is that you need to think hard about your data and what you want to do with it. Once you know the use cases you need to support, you'll know how to map your fields. I would never accept to let a default text/keyword mapping if there's no reason to do it.

0
On

The performance of your search and indexing depends on size of your string field, if you have a large string and map it as a keyword it will have a heavy impact in your indexing and your search performance. if you decide to map field as both text and keyword be sure to set ignore_above in keyword becasue Lucene’s term byte-length limit is 32766, means Elasticsearch will not index strings bigger than this size as a keyword.

Also the type of Analyzer that you are going to use for your string fields have impact.