I want to add new tagged words( local words that is used in our region ) and create a new model. I created a .prop file from command line but how can i create a .tagger file?
When i tried to create such file as mentioned on Stanford website it shows an error like
"No model specified"
what is the -model argument, is it the corpus? how can i add my new tagged words into that?
How do I train a tagger, then?
The Stanford site says that:
You need to start with a .props file which contains options for the tagger to use. The .props files we used to create the sample taggers are included in the models directory; you can start from whichever one seems closest to the language you want to tag.
For example, to train a new English tagger, start with the left3words tagger props file. To train a tagger for a western language other than English, you can consider the props files for the German or the French taggers, which are included in the full distribution. For languages using a different character set, you can start from the Chinese or Arabic props files. Or you can use the -genprops option to MaxentTagger, and it will write a sample properties file, with documentation, for you to modify. It writes it to stdout, so you'll want to save it to some file by redirecting output (usually with >). The # at the start of the line makes things a comment, so you'll want to delete the # before properties you wish to specify.
The
model
property specifies the file to which the built model will be saved. You can provide any valid path, e.g.mymodel.tagger
.You can use this same properties file at test time, and
MaxentTagger
will then load from the specified model file rather than saving to it.To be clear: your training corpus should be provided with the property
trainFile
. See the tagger properties files included with the Stanford Tagger for examples.