Question:
How to force Stanford NLP's WordsToSentencesAnnotator to split sentences on dots? I tried adding -ssplit.boundaryMultiTokenRegex "//.", but it still fails to split on the . all the time.
I use Stanford CoreNLP version 3.5.2 (2015-04-20) on Windows 7 SP1 x64 Ultimate with Java 1.8.0_25 x64.
Example:
I have a text that contains two sentences: D R E L I N. Okay.
I use Stanford NLP's WordsToSentencesAnnotator to split the text into sentences through the command line interface:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP
-annotators tokenize,ssplit -file test.txt
It returns just one sentence D R E L I N. Okay. instead of two sentences ['D R E L I N.', 'Okay.'], .i.e. looking at the output XML file, the node sentences had only one sentence child:
