The Treetagger can do POS-tagging as well as text-chunking, which means extracting verbal and nominal clauses, as in this German example:
$ echo 'Das ist ein Test.' | cmd/tagger-chunker-german
    reading parameters ...
    tagging ...
     finished.
<NC>
Das PDS die
</NC>
<VC>
ist VAFIN   sein
</VC>
<NC>
ein ART eine
Test    NN  Test
</NC>
.   $.  .
I'm trying to figure out how to do this with the Treetaggerwrapper in Python (since it's faster than directly calling Treetagger), but I can't figure out how it's done. The documentation refers to chunking as preprocessing, so I tried using this:
tags = tagger.tag_text(u"Dieser Satz ist ein Satz.",prepronly=True)
But the output is just a list of the words with no information added. I'm starting to think that what the Wrapper calls Chunking is something different than what the actual tagger calls Chunking, but maybe I'm just missing something? Any help would be appreciated.
 
                        
The original poster is right in his assumptions.
treetaggerwrapper(as of version 2.2.4) defines chunking as merely "preprocessing of text", and does not fully wrap TreeTagger's capabilities in this sense. Fromtreetaggerwrapper.py:But inspecting
tagger-chunker-germanone can see that getting clauses and tags is a string of operations, actually calling TreeTagger 3 times:$ echo 'Das ist ein Test.' | cmd/tree-tagger-german | perl -nae 'if ($#F==0){print} else {print "$F[0]-$F[1]\n"}' | bin/tree-tagger lib/german-chunker.par -token -sgml -eps 0.00000001 -hyphen-heuristics -quiet | cmd/filter-chunker-output-german.perl | bin/tree-tagger -quiet -token -lemma -sgml lib/german-utf8.parwhereas
treetaggerwrapper's tagging command (shown intagcmdlist) is actually a one-shot call (after it's own preprocessing of the text) to:bin/tree-tagger -token -lemma -sgml -quiet -no-unknown lib/german-utf8.parThe point of entry to extend it for chunking is the line
"tagparfile": "german-utf8.par",where you would define something like
"chunkingparfile": "german-chunker.par",and issue an additional call to TreeTagger with this other parfile following the
tagger-chunker-germanoperation chain. You'd then probably still have to copy some extra logic fromcmd/filter-chunker-output-german.perlthough.