Elastic search index for Ngram?

590 Views Asked by At

Say I have a sentence This is a new city

  1. Does Elastic search create index for all possible permutation/combination of a word. For example for word "city", will it create the index "it","ty","ity", "cit" etc ?
  2. Are these indexes created at document storage time or at run time ?
  3. Are these indexes kept in memory or in DB?
1

There are 1 best solutions below

6
alpert On BEST ANSWER
  1. That depends on your tokenizer. By default Elasticsearch uses Standant Tokenizer which divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. That means your sentence will be tokenized as this, is, a, new, city. You can create custom tokenizer if you like to.

  2. Documents are indexed when you put them to Elasticsearch.

  3. The data is kept in file system: https://www.elastic.co/blog/found-dive-into-elasticsearch-storage

Here is a blog post about internals: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up