Forcing Stanford NLP's WordsToSentencesAnnotator to split sentences on dots

177 Views Asked by At

Question:

How to force Stanford NLP's WordsToSentencesAnnotator to split sentences on dots? I tried adding -ssplit.boundaryMultiTokenRegex "//.", but it still fails to split on the . all the time.

I use Stanford CoreNLP version 3.5.2 (2015-04-20) on Windows 7 SP1 x64 Ultimate with Java 1.8.0_25 x64.


Example:

I have a text that contains two sentences: D R E L I N. Okay.

I use Stanford NLP's WordsToSentencesAnnotator to split the text into sentences through the command line interface:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP 
             -annotators tokenize,ssplit  -file test.txt

It returns just one sentence D R E L I N. Okay. instead of two sentences ['D R E L I N.', 'Okay.'], .i.e. looking at the output XML file, the node sentences had only one sentence child:

enter image description here

0

There are 0 best solutions below