Lucene 4.9: During Indexing add Sentiment to sentences

78 Views Asked by At

I have a SentimentAttribute class which extends AttributeImpl. Also I am currently writing a SentenceSentimentTaggingFilter class which should

  1. take InputStream (consisting of text)
  2. tokenize it into sentences
  3. assign a sentiment to each sentence, i.e., by adding SentimentAttribute to it

The problem I currently have is that it seems like there is only functionality inside Lucene which tokenizes text into individual tokens, e.g., single words, but nothing to split into sentences.

What is the best way to integrate this with a regular EnglishAnalyzer I'm also using during indexing? I would like to avoid to process both EnglishAnalyzer and my analysis in parallel but rather hook in my analysis in between the processing steps of the EnglishAnalyzer (assuming that this is the fastest / most efficient way).

Thanks a lot in advance :)

1

There are 1 best solutions below

0
On

I'm actually doing something very similar but in an earlier version of Lucene, V3.0.2. You may want to look at the following class:

org.apache.lucene.wordnet.AnalyzerUtil

Although you've probably found a way to do this by now. I hope it might help anyway.