I am trying to find a solution to combine all tokens (terms) after tokenisation.
for example - This analyser(my-analyser) produce n tokens after applying "custom_stop" filter. Is there any way to combine all tokens and generate one single token?
I have seen 'fingerprint' filter which combine all tokens but it does sorting as well, which I don't want. Please suggest solution for this.
"analysis": {
"analyzer": {
"my-analyser": {
"tokenizer": "standard",
"filter": [ "custom_stop"]
}
},
"filter": {
"custom_stop": {
"type": "stop",
"ignore_case": true,
"stopwords": [ "elastic", "aws", "java" ]
}
}
for the input - "The concepts in elastic aws java are discussed here" it would produce these tokens - ["concepts", "discussed", "here"],
I want to combine these three tokens and generate one token like ["concepts discussed here"]