How does TreeTagger get the lemma of a word?

406 Views Asked by Rodrigo Serna Pérez At 05 June 2018 at 11:04

I am using TreeTagger to get the lemmas of words in Spanish, but I have observed there are too much words which are not transformed as should be. I would like to know how this operations works, if it is done with techniques such as decision trees or machine learning algorithms or it simply contains a list of words with its corresponding lemma. Does someone know it? Thanks!!

Original Q&A

There are 1 best solutions below

Manuel Bickel On 11 June 2018 at 11:17

On basis of personal communication via email with H. Schmid, the author of TreeTagger, the answer to your question is:

The lemmatization function is based on the XTAG Project, which includes a morphological analyzer. Within the XTAG project several corpora have been analyzed. Considerung TreeTagger, especially the analysis of the Penn Treebank Corpus seems relevant, since this corpus is the training corpus for the English parameter file of TreeTagger. Considering lemmatization, the lemmata have simply been stored in a lexicon. TreeTagger finally uses this lexicon as a lookup table.

Hence, with TreeTagger you may only retreive the lemmata that are available in the lexicon.

In case you need additional funtionality regarding lemmatization beyond the options in TreeeTagger, you will need a morphological analyzer and, depending on your approach, a suitable training corpus, although this does not seem mandatoriy, since several analyzers perform quite well even when directly applied on the corpus of interest to be analyzed.

How does TreeTagger get the lemma of a word?

There are 1 best solutions below

Related Questions in NLP

Related Questions in LEMMATIZATION

Related Questions in TREETAGGER

Trending Questions

Popular # Hahtags

Popular Questions