I analyse some press reviews with a java program, using Boilerpipe. I use Pattern
and Matcher
to extract keywords from the text.
My problem is that I have some Enterprises (WHO, Total, 2A, SEE, ARE...) I also need to extract, and as you can see, their name also exist as common words, so I get some "who, see, are"... results, despite the article doesn't speak about the enterprises. Have you got any idea how I could solve the problem (maybe like analysing the neighborhood of the word...) ?