For a huge set of articles, I want to get the topic models with weightage assigned to different topics & within topics, what are the weightage for different sub-topics. For example, if I feed an article which falls in both Business & Technology domain, then the program's output shuold be something like this :-
- 0.593 Business ( 0.438 - Marketing , 0.375 - Companies, 0.062 - Office Work)
- 0.148 Technology ( 0.500 Technology by type, 0.250 - High_technology Business Districts, 0.250 - Technology Companies)
- 0.111 Society ( 0.333 - Organizations, 0.333 - Technology in Society, 0.333 - Labor)
What's the best open-source language processing programs available that can successfully do this stuff?
You can classify using the open-source NLTK Toolkit.