Change text in reusable pipeline in DKPro

509 Views Asked by At

This questions describes how to reuse a pipeline in dkpro but if I only create one JCas and then try to change the text then I get the exception

org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set.

How do I get around this?

1

There are 1 best solutions below

2
On BEST ANSWER

The sofa data in the CAS can only be set once. It cannot be modified after it has been set.

In order to re-use a CAS, call the reset() method on it. This clears all annotations and allows you to set the sofa/text again.

To build a CAS incrementally, a common strategies is to add annotations to the CAS while adding text to a string buffer and setting the text only at the end of the process.

An uimaFIT-based example could look something like this:

Strings[] texts = {
    "Hello world.",
    "This is a test." };

// Create empty CAS/JCas initialized using uimaFIT typesystem auto-detection
JCas jcas = JCasFactory.createJCas();

// Instantiate some analysis engine
AnalysisEngine engine = AnalysisEngineFactory.createEngine(...);

// Process texts re-using the previously created CAS/JCas instance
for (String t : texts) {
    jcas.reset();
    jcas.setDocumentText(t);
    jcas.setDocumentLanguage("en");
    engine.process(jcas);
}

engine.collectionProcessComplete();
engine.destroy();

Disclosure: I am working on the Apache UIMA project.