IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes Stanford NLP

307 Views Asked by At

I'm trying to test the Hello word of Stanford POS tagger API in Java (I used the same .jar in python and it worked well) on french sentences. Here is my code

public class TextPreprocessor {

    private static MaxentTagger tagger=new MaxentTagger("../stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger");

    public static void main(String[] args) {
        
        String taggedString = tagger.tagString("Salut à tous, je suis coincé");
        System.out.println(taggedString);
    }
}

But I get the following exception:

Loading POS tagger from C:/Users/_Nprime496_/Downloads/Compressed/stanford-tagger-4.1.0/stanford-postagger-full-2020-08-06/models/french-ud.tagger ... done [0.3 sec].
Exception in thread "main" java.lang.IllegalArgumentException: PTBLexer: Invalid options key in constructor: asciiQuotes
    at edu.stanford.nlp.process.PTBLexer.<init>(PTBLexer.java)
    at edu.stanford.nlp.process.PTBTokenizer.<init>(PTBTokenizer.java:285)
    at edu.stanford.nlp.process.PTBTokenizer$PTBTokenizerFactory.getTokenizer(PTBTokenizer.java:698)
    at edu.stanford.nlp.process.DocumentPreprocessor$PlainTextIterator.<init>(DocumentPreprocessor.java:271)
    at edu.stanford.nlp.process.DocumentPreprocessor.iterator(DocumentPreprocessor.java:226)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tokenizeText(MaxentTagger.java:1148)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger$TaggerWrapper.apply(MaxentTagger.java:1332)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.tagString(MaxentTagger.java:999)
    at modules.generation.preprocessing.TextPreprocessor.main(TextPreprocessor.java:19)

Can you help me?

1

There are 1 best solutions below

7
On BEST ANSWER

You can use this code and the full CoreNLP package:

package edu.stanford.nlp.examples;

import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;

import java.util.*;


public class PipelineExample {

  public static String text = "Paris est la capitale de la France.";

  public static void main(String[] args) {
    // set up pipeline properties
    Properties props = StringUtils.argsToProperties("-props", "french");
    // set the list of annotators to run
    props.setProperty("annotators", "tokenize,ssplit,mwt,pos");
    // build pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    // create a document object
    CoreDocument document = pipeline.processToCoreDocument(text);
    // display tokens
    for (CoreLabel tok : document.tokens()) {
      System.out.println(String.format("%s\t%s", tok.word(), tok.tag()));
    }
  }

}

You can download CoreNLP here: https://stanfordnlp.github.io/CoreNLP/

Make sure to download the latest French models.

I am not sure why your example with the standalone tagger does not work. What jars were you using?