Question:
How to force Stanford NLP's WordsToSentencesAnnotator to split sentences on dots? I tried adding -ssplit.boundaryMultiTokenRegex "//."
, but it still fails to split on the .
all the time.
I use Stanford CoreNLP version 3.5.2 (2015-04-20) on Windows 7 SP1 x64 Ultimate with Java 1.8.0_25 x64.
Example:
I have a text that contains two sentences: D R E L I N. Okay.
I use Stanford NLP's WordsToSentencesAnnotator to split the text into sentences through the command line interface:
java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP
-annotators tokenize,ssplit -file test.txt
It returns just one sentence D R E L I N. Okay.
instead of two sentences ['D R E L I N.', 'Okay.']
, .i.e. looking at the output XML file, the node sentences
had only one sentence
child: