Forcing Stanford NLP's WordsToSentencesAnnotator to split sentences on dots

158 Views Asked by Franck Dernoncourt At 28 July 2025 at 22:39

Question:

How to force Stanford NLP's WordsToSentencesAnnotator to split sentences on dots? I tried adding -ssplit.boundaryMultiTokenRegex "//.", but it still fails to split on the . all the time.

I use Stanford CoreNLP version 3.5.2 (2015-04-20) on Windows 7 SP1 x64 Ultimate with Java 1.8.0_25 x64.

Example:

I have a text that contains two sentences: D R E L I N. Okay.

I use Stanford NLP's WordsToSentencesAnnotator to split the text into sentences through the command line interface:

java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP 
             -annotators tokenize,ssplit  -file test.txt

It returns just one sentence D R E L I N. Okay. instead of two sentences ['D R E L I N.', 'Okay.'], .i.e. looking at the output XML file, the node sentences had only one sentence child:

enter image description here

Original Q&A

Forcing Stanford NLP's WordsToSentencesAnnotator to split sentences on dots

There are 0 best solutions below

Related Questions in REGEX

Related Questions in NLP

Related Questions in STANFORD-NLP

Trending Questions

Popular # Hahtags

Popular Questions