ElasticSearch: Configuring a custom analyzer implementation

3.3k Views Asked by At

Currently i am evaluating if and how a legacy lucene-based analyzer component can be moved to elastic search (0.19.18). Since the legacy code is based on lucene i wrapped the analyzer in an es-plugin. The analyzer's configuration looks like the following lines:

index.analysis.analyzer.myAnalyzer.type : myAnalyzer
index.analysis.analyzer.default.type: myAnalyzer
index.analysis.analyzer.default_index.type: myAnalyzer
index.analysis.analyzer.default_search.type: myAnalyzer

So far so good.

curl -XGET 'localhost:9200/_analyze' -d 'Some text'

Would return an object that contains the correctly tokenized text, but

curl -XGET 'localhost:9200/<name-of-my-index>/_analyze' -d 'Some text'

would return a text, that is not tokenized at all. Obviously, instead of myAnalyzer only the lower case filter is applied. The objects in the index are neither correctly analyzed.

The index mappings look like this (output from head-plugin):

mappings: {
item: {
    analyzer: myAnalyzer
    properties: {
        id: {
            type: string
        }
        itemnumber: {
            type: string
        }
        articletext: {
            analyzer: myAnalyzer
            type: string
        }
        sortvalue: {
            type: string
        }
        salesstatus: {
            format: dateOptionalTime
            type: date
        }
    }
}
}

Since i am new to ES, i can't figure out, what the reason for this behaviour actually is. Is there somebody with an idea?

1

There are 1 best solutions below

2
On

This how I set a custom default analyzer in Elasticsearch.

index:
  analysis:
    analyzer:
      default:
        filter: [lowercase]
        tokenizer: whitespace
        type: custom

Works like a charm.