Made a word classifier with nlpnet (http://nilc.icmc.usp.br/nlpnet/index.html). the goal is to extract only words individually with given tagger.
response code
import nlpnet
import codecs
import itertools
TAGGER = nlpnet.POSTagger('pos-pt', language='pt')
def TAGGER_txt(text):
return (list(TAGGER.tag(text)))
with codecs.open('document.txt', encoding='utf8') as original_file:
with codecs.open('document_teste.txt', 'w') as output_file:
for line in original_file.readlines():
print (line)
words = TAGGER_txt(line)
all_words = list(itertools.chain(*words))
nouns = [word[0] for word in all_words if word[1]=='V']
print (nouns)
Result
O gato esta querendo comer o ratão
['gato', 'ratão']
I think this could be the essence of what you need. Please see edited version.
As you say in your question, the result of tagging
Sentence
would be something liketagged
. If you wanted just the nouns fromSentence
you could recover them using the expression afternouns =
.Output:
Edit: It's not clear to me what you want. Here's another possibility.
codecs.open
..
Output: