I have a SentimentAttribute
class which extends AttributeImpl
. Also I am currently writing a SentenceSentimentTaggingFilter
class which should
- take
InputStream
(consisting of text) - tokenize it into sentences
- assign a sentiment to each sentence, i.e., by adding
SentimentAttribute
to it
The problem I currently have is that it seems like there is only functionality inside Lucene which tokenizes text into individual tokens, e.g., single words, but nothing to split into sentences.
What is the best way to integrate this with a regular EnglishAnalyzer
I'm also using during indexing? I would like to avoid to process both EnglishAnalyzer
and my analysis in parallel but rather hook in my analysis in between the processing steps of the EnglishAnalyzer
(assuming that this is the fastest / most efficient way).
Thanks a lot in advance :)
I'm actually doing something very similar but in an earlier version of Lucene, V3.0.2. You may want to look at the following class:
org.apache.lucene.wordnet.AnalyzerUtil
Although you've probably found a way to do this by now. I hope it might help anyway.