create pos tagged corpus with NLTK

861 Views Asked by At

I want to build pos tagged corpus with NLTK. So that I can train my model based on it.

Till now I have referred many sources but each one just explaining to how to read your tagged corpus and reading words, sentences etc. Following is a piece of code I tried:

from nltk.corpus.reader import TaggedCorpusReader
reader = TaggedCorpusReader('/home/abc/nltk_data/', 'pos_tagged.pos')
reader.words()
reader.tagged_words()
reader.sents()

I want to include my corpus in home/nltk_data/corpora/ folder so that I can import the corpus I created. Please guide me.

1

There are 1 best solutions below

0
On

I got the working solution for this: Kindly refer to link for step by step procedure.

Download necessary files for the same from here.

Once you follow commands from 1 pickle file will be generated and this is your tagged corpus.

Once pickle file is generated you can check whether your tagger is working fine by running following piece of code:

import nltk.data
tagger = nltk.data.load("taggers/NAME_OF_TAGGER.pickle")
tagger.tag(['some', 'words', 'in', 'a', 'sentence'])