given word: "ABC regional private coastal area"
(shingle filter factory)tokenization i want: "ABC regional private coastal area", "ABC regional private coastal", "ABC regional private", "ABC regional", "ABC".
results: "ABC regional private coastal area", "ABC regional private coastal","ABC regional", "ABC", "regional" etc..
and some times creates tokenization i want like "regional _ coastal", "regional _ coastal area", "_ coastal"
is there any filter or tokenizer that will help me achieve this result.
already tried: edgeNGram(character level token-split), Ngram(character level token-split), Shinglefilterfactory(word leveltoken-split).
results: shingle comes close but it also creates token like word: "hello world sample" after tokenization: hello world , world, sample which gives me unecessary results for both sample and world which i dont need.
Thanks in advance.
use these links to look at the query and results [Query Performed(https://i.stack.imgur.com/TUHHn.png)]Shingle]EdgeNGram]