Does anyone know any tagged corpus or lexicon for using the Brill Part-of-Speech (POS) tagger in other languages than English?
Thanks!
Does anyone know any tagged corpus or lexicon for using the Brill Part-of-Speech (POS) tagger in other languages than English?
Thanks!
Copyright © 2021 Jogjafile Inc.
If you're using
NLTK
(http://nltk.org/) and coding withpython
you can do as follows. You don't even need to code your own brill-tagger since it's already inside the library, http://nltk.org/_modules/nltk/tag/brill.html.There's a list of corpus with corpus readers already pre-coded in
NLTK
: http://nltk.googlecode.com/svn/trunk/nltk_data/index.xmlHere's an example to apply the brill tagger to a Dutch corpus:
In fact, if you're hardworking enough to read until this point, here's the trick to train a brill tagger in NLTK by just inputting the corpus =)
Essentially with
train_brill_tagger()
andtrain_brill_with_corpus()
, you can just do this: