I'm using OpenNLP and it works fine for detecting parts of speech and such when doing this:
try (InputStream modelIn = new FileInputStream("en-pos-maxent.bin"){
POSModel model = new POSModel(modelIn);
POSTaggerME tagger = new POSTaggerME(model);
String tags[] = tagger.tag(tokenList);
}
so if tokens = [Test, Recipe, of, Incredible, Goodness, .] then tags = [ADJ, NOUN, ADP, ADJ, NOUN, PUNCT]
can I further add even more tags than just those defined as parts of speech? what if I want to add a tag for short words, products, food, etc...
would i need to add a custom POS model with my definitions, run it in addition to the english POS model, and just have additional tag arrays for each POS model that I run the sentence through??
I have tried what I described, defining my own model and running it so that I have multiple arrays. I was just wondering if there was some other way to do this that might be better than what I tried.
I decided to tackle it this way. whatever limited knowledge I have seems to be no limitation here.
I was using POSSample as my object which stores tags and tokens together, i created a different object like POSSample which stores in a hashmap the same tags, tokens, but also expandable to whatever other data i want to put in there like lems, custom tags, etc...
my tag types i use as the hash key is just an enum with values i can use for bitwise operations... so i can easily flag when i want or don't want these extra tags to be scanned for and populated in my sentence tagger
so doing this way, normally LEMS for instance would not get populated unless my method specified: