Text classification & topic modelling

572 Views Asked by user4993746 At 16 June 2015 at 13:27

For a huge set of articles, I want to get the topic models with weightage assigned to different topics & within topics, what are the weightage for different sub-topics. For example, if I feed an article which falls in both Business & Technology domain, then the program's output shuold be something like this :-

0.593 Business ( 0.438 - Marketing , 0.375 - Companies, 0.062 - Office Work)
0.148 Technology ( 0.500 Technology by type, 0.250 - High_technology Business Districts, 0.250 - Technology Companies)
0.111 Society ( 0.333 - Organizations, 0.333 - Technology in Society, 0.333 - Labor)

What's the best open-source language processing programs available that can successfully do this stuff?

Original Q&A

There are 3 best solutions below

skaz On 16 June 2015 at 13:28

You can classify using the open-source NLTK Toolkit.

jgloves On 16 June 2015 at 14:56

I would give NLTK a try, but scikit-learn, even though it has a steeper learning curve than NLTK, is probably a better bet. It's much more configurable.

http://scikit-learn.org/stable/documentation.html

Sir Cornflakes On 18 June 2015 at 09:21

There are several programs to do a part of this task, for a starter I recommend mallet. Note that any topic modeling program gives you the topics in the form you want, i.e.,

 ( 0.438 - Marketing , 0.375 - Companies, 0.062 - Office Work)

but the labels (in this example Business) you need to assign yourself. Mallet also gives you a decomposition of the text to the topics (identified by numbers, not by the labels).

Text classification & topic modelling

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in TEXT-MINING

Related Questions in TEXT-CLASSIFICATION

Related Questions in TOPIC-MODELING

Trending Questions

Popular # Hahtags

Popular Questions